SIDR: Efficient Structure-Aware Intelligent Data Routing in SciHadoop

资源类型: 资源大小: 文档分类: 上传者:张红侠


【标题】SIDR: Efficient Structure-Aware Intelligent Data Routing in SciHadoop

【作者】 Joe B. Buck  Noah Watkins  Greg Levin  Adam Crume  Kleoni Ioannidou  Scott A. Brandt  Carlos Maltzahn  Neoklis Polyzotis 

【摘要】MapReduce is a popular framework for distributed; parallel computation that has started to be used in domains quite different from the web applications for which it was designed; including the processing of big structured data; e.g.; scientific and financial data. Previous work on using MapReduce to process scientific data did not incorporate knowledge of this structure in its internal communications. We show that performance gains can be realized by leveraging knowledge of the structure of the data to minimize and localize communications between nodes; guarantee workload balance across processing nodes; ensure that Reduce tasks start as soon as possible; and create balanced; contiguous output. We implemented these improvements in SciHadoop; a version of the open-source Hadoop MapReduce framework designed for structured scientific data. Our results show total query execution time reductions of up to 29% over SciHadoop with initial results available with only 6% of the query completed; and the resultant output is more efficiently organized; compared to Hadoop.