Hadoop
Ecosystem
The Hadoop ecosystem include HDFS, Hive, Pig, YARN, MapReduce, Spark, HBase, Oozie, Sqoop, Zookeeper, etc. HDFS
Hadoop Distributed File System (HDFS), is one of the largest Apache projects and primary storage system of Hadoop.
It employs a NameNode and DataNode architecture.
It is a distributed file system able to store large files running over the cluster of commodity hardware.
YARN
YARN stands for Yet Another Resource Negotiator
It is one of the core components in open source Apache Hadoop suitable for resource management.
It is responsible for managing workloads, monitoring, and security controls implementation.
It also allocates system resources to the various applications running in a Hadoop cluster while assigning which tasks should be executed by each cluster node.
YARN has two main components:
Resource Manager
Node Manager
Pig
A high-level scripting language used to execute queries for larger datasets that are used within Hadoop. Pig’s simple SQL-like scripting language is known as Pig Latin and its main objective is to perform the required operations and arrange the final output in the desired format.