Hadoop


Ecosystem

  • The Hadoop ecosystem include HDFS, Hive, Pig, YARN, MapReduce, Spark, HBase, Oozie, Sqoop, Zookeeper, etc. HDFS

  • Hadoop Distributed File System (HDFS), is one of the largest Apache projects and primary storage system of Hadoop.

  • It employs a NameNode and DataNode architecture.

  • It is a distributed file system able to store large files running over the cluster of commodity hardware.

YARN

  • YARN stands for Yet Another Resource Negotiator

  • It is one of the core components in open source Apache Hadoop suitable for resource management.

  • It is responsible for managing workloads, monitoring, and security controls implementation.

  • It also allocates system resources to the various applications running in a Hadoop cluster while assigning which tasks should be executed by each cluster node.

YARN has two main components:

  • Resource Manager

  • Node Manager

  • Pig

  • A high-level scripting language used to execute queries for larger datasets that are used within Hadoop. Pig’s simple SQL-like scripting language is known as Pig Latin and its main objective is to perform the required operations and arrange the final output in the desired format.