Distributed Storage Architecture and Hadoop Framework Tools for Crunching Big Data in Clustered Processing System
Huge amount of terabytes data is being created in today's information technology world such as cloud computing, social media, Internet of Things (IoT) and Internet. Need an important tools to analysis and extract such huge data. Big Data tools allows to extract the information from unstructured format and keep them in the form of events, objects, entities, relations, table format and many other types. Information Extraction is the system where it can extract information from both structured data and non structured data. The basic purpose of this paper is to provide a good understand of Big data tools for extraction and analysis of terabyte data. Big Data has now begun to intervene in a variety of sectors such as astronomy, economics, chemistry, transport and research. Every department has now begun to store very sensitive information for its growth and functioning. Big Data is all about "big data", such as how to extract only the most important information (Data Mining), how to extract the extracted information into a data pipeline and how to submit it to the user. Big data not only stores the information in the orderly format, but also storing information in the unstructured form. Different kinds of logics were chosen to analysis the features suitability of the Big Data and various tools associated with it: the Kibana, Hadoop, HDFS, Pig, Hive and Spark.