数据库实验室1篇论文被《计算机研究与发展》EI期刊录用

基于关系数据库的top-k聚合关键词查询

张东站, 苏志锋, 林子雨+, 薛永生

(厦门大学计算机科学系福建厦门 361005)

(ziyulin@xmu.edu.cn)

Top-k Aggregation Keyword Search over Relational Databases

Zhang Dongzhan, Su Zhifeng, Lin Ziyu, Xue Yongsheng

(Department of Computer Science, Xiamen University, Xiamen, Fujian 361005)

Abstract Keyword search over relational databases allows users to search databases without any knowledge of database schema and query languages. Given a keyword query, the existing approaches find individual tuples which match a set of query keywords based on primary-foreign-key relationships in databases. However, it is more useful for users to get the aggregation result of tuples in many real applications, and those existing methods cannot be used to deal with such issue. Therefore, this paper is focused on the problem of top-k aggregation keyword search over relational databases. Here recursion-based full search algorithm, i.e., RFS, is proposed to get all aggregation cells. To achieve high performance, new ranking techniques, keyword-tuple-based two dimensional index and quick search algorithm, i.e., OQS, are developed for effectively identifying top-k aggregation cells. A large number of experiments have been implemented upon two large real datasets, and the experimental results show the benefits of our approach.

Key words aggregation keyword search; relational databases; two dimensional index; aggregation cell; ranking

摘要基于关系数据库的关键词查询,使得用户在不需要掌握结构化查询语言和数据库模式的情况下,可以方便的进行关系数据库查询.给定一个关键词查询,已有的方法通过数据库中的主外键关联,查询得到包含关键词的元组集合.但是,在很多实际应用中,元组集合的聚合结果对用户更有价值;研究了基于关系数据库的top-k聚合关键词查询,提出了基于递归的聚合单元枚举算法—RFS.为了获得更好的查询性能, 设计了新的排序方法、二维索引和快速搜索算法—OQS,从而可以高效地枚举top-k个聚合单元;在不同的数据集上进行了大量的实验,实验结果表明OQS算法具有良好的查询性能.

关键词聚合关键词查询;关系数据库;二维索引;聚合单元;排序

【注：论文已经录用，等待发表，论文PDF请等待《计算机研究与发展》官网发布】