《软件学报》论文:用基于移动均值的索引实现时间序列相似查询

林子雨注:本文是在北京大学数据库实验室攻读博士学位期间(2005年9月到2009年7月,四年制博士)发表。

[LYW08]林子雨, 杨冬青,王腾蛟. 用基于移动均值的索引实现时间序列相似查询. 软件学报. Vol.19, No.9, September 2008, pp.2349-2361.【全文PDF下载】

用基于移动均值的索引实现时间序列相似查询
林子雨1,2+, 杨冬青1,2, 王腾蛟1,2
1(高可信软件技术教育部重点实验室(北京大学),北京 100871)
2(北京大学 信息科学技术学院,北京 100871)
Similarity Search of Time Series with Moving Average Based Indexing
LIN Zi-Yu1,2+, YANG Dong-Qing1,2, WANG Teng-Jiao1,2
1(Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871, China)
2(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China)
 

Lin ZY, Yang DQ, Wang TJ. Similarity search of time series with moving average based indexing. Journal of Software, 2008,19(9):2349−2361. http://www.jos.org.cn/1000-9825/19/2349.htm
Abstract: In this paper, a new method, called MABI (Moving Average Based Indexing) is proposed to effectively deal with the issue of ε-search query in subsequence matching. Two important theorems, distance reduction theorem and DRR relation theorem, are proposed here to be the basis of MABI. DRR relation theorem has strong capability in “pruning” those unqualified candidate sequences so as to achieve the purpose of fast similarity search. Furthermore, in order to speed up the process of similarity search, a multi-way balanced tree structure, called BATON* proposed by Jagadish et al., is also introduced here, with some modification, to construct the index from time series. Extensive experiments over a stock exchange dataset show that MABI can achieve desirable performance.

Key words: similarity search; subsequence matching; moving average; time series databases
摘 要: 提出了基于移动均值的索引来解决子序列匹配中的“ε-查询”问题;提出并证明了基于移动均值的缩距定 理和缩距比关系定理,后者具有很好的“裁减”能力,可以在相似查询时淘汰大部分不符合条件的侯选时间序列,从而达到快速相似查找的目的;引入了由Jagadish 等人提出的BATON*-树,并在此基础上适当修改后建立了MABI索引,大大加快了相似查询过程;最后,在一个股票交易数据集上进行了诸多实验,证明了MABI 索引的良好性能.
关键词: 相似查询;子序列匹配;移动均值;时间序列数据库

【全文PDF下载】