Apache Hadoop for processing image files using Sequence file


  • Dr. E. Laxmi Lydia
  • Hima Bindu Gogineni
  • Ravva Ravi
  • G Sandhya
  • G. Jose Moses


MapReduce as an advanced approach to process data in a distributed manner by taking advantage of the Hadoop framework which is an open-source for employinga tremendous volume of data. Data available in multimedia at excessive quantity in the progressive world allows new demand for processing and storage. Hadoop working as a distributed computational framework as open-source to all available data considers the processing of different forms of data (such as images) on a thriving organization of calculating nodes by providing essential associations.  This will accept loads and plenty of images files and used to abolishreplicate files from the feasible appropriate data. Compressed binary format data or encrypted binary format data, in particular, cannot be partitioned but can only be read as the distinguished consecutive flow of data. Practicing such files as input to a MapReduce task (Job), such that the process of single mapper allowing performing till the entire file gets processed, provoking a conceivable large performance potency accomplishment. The paper intends to present a suitable splittable format of data using SequenceFile and MD5 algorithm results to enhance the effectiveness of image processing.

Keywords: Sequence File, MapReduce, Distributed Processing, Image files, Hadoop