论文:MESHJOIN*:实时数据仓库环境下的数据流更新算法

[LinLFZ10] 林子雨,林琛,冯少荣,张东站. MESHJOIN*:实时数据仓库环境下的数据流更新算法.计算机科学与探索,Vol4(10),Oct,2010.PP:927-939.[全文PDF下载]

MESHJOIN*:实时数据仓库环境下的数据流更新算法*
林子雨+, 林 琛, 冯少荣, 张东站
厦门大学 计算机科学系, 福建 厦门 361005
MESHJOIN*: An Algorithm Supporting Streaming Updates in a Real-time Data
Warehouse*
LIN Ziyu+, LIN Chen, FENG Shaorong, ZHANG Dongzhan
Department of Computer Science, Xiamen University, Xiamen, Fujian 361005, China
+ Corresponding author: E-mail: ziyulin@xmu.edu.cn
 

LIN Ziyu, LIN Chen, FENG Shaorong, et al. MESHJOIN*: An algorithm supporting streaming updates in a real-time data warehouse. Journal of Frontiers of Computer Science and Technology, 2010, 4(10): 927-939.
Abstract: A new algorithm called MESHJOIN* is proposed to support streaming updates under real-time data warehouse environment. It has the following distinct features: (1) Relation R is organized in blocks and hashes so as to avoid the reading of unusable tuples for the current join operation as much as possible, through which the amount of tuples involved in a join is much reduced, thus enhancing the efficiency of the join operation; (2) Multi-thread parallel execution technology is adopted here, and the order of read operation and join operation is optimized according to engineering theory so as to maximize the efficiency of join algorithm; (3) Reasonable scheduling of real-time tuples and near-real-time tuples is achieved according to the relationship between the current system service rate and the tuples arriving rate, so that the requirement for the processing of real-time tuples is satisfied. Experimental results show that MESHJOIN* can achieve much better performance than MESHJOIN.
Key words: data warehouse; streaming update; join

摘 要: 提出了一种新的实时数据仓库环境下的数据流更新算法——MESHJOIN*算法。算法的特性有:(1) 关系R采用了分块和散列的组织形式, 尽可能避免对当前连接无效元组的读取, 减少连接操作所涉及元组的数量, 从而提高连接算法的效率; (2) 采用了多线程并发连接技术, 并根据工程学原理, 实现了连接操作和关系R 读取操作的最佳调度, 保证了连接算法效率的最大化; (3) 根据当前系统的服务率和数据流元组的到达率之间的关系, 合理调度实时元组和准实时元组的执行, 保证了系统对实时元组的处理要求。实验结果表明, MESHJOIN*算法可以取得比MESHJOIN 算法更好的性能。

[全文PDF下载]