Hadoop is an open source, distributed processing framework that is used to develop software for reliable, scalable, distributed computing; the kind of distributed computing that is needed to nurture the growth of big data. It manages the processing of data and storage for big data applications that runs in clustered systems. The emerging ecosystem of big data technologies tactically places Hadoop in the centre. These technologies are used to support advanced analytics initiatives that include predictive analytics, data mining, and machine learning applications.
At core, Hadoop utilizes modules such as Hadoop Distributed File System (HDFS), Hadoop MapReduce and Hadoop YARN. HDFS is a file system that is distributed by EMC, Intel and IBM to provide high throughput access to application data. The motive is to distribute the processing of large data sets over clusters of low-cost computers. Similarly, Hadoop MapReduce is a component that allows users to distribute large sets of data over a series of computers for parallel processing, while Hadoop YARN is a skeletal framework that manages job scheduling and the management of cluster resources.
To read the entire article, please click on https://cloud.cioreview.com/news/hadoop-and-its-relation-with-cloud-nid-26704-cid-17.html