Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster. The importance of this knowledge relies on this assumption that collocated data nodes inside a specific rack will have more bandwidth and less latency whereas two data nodes in separate racks will have comparatively less bandwidth and higher latency.
Hadoop components are rack-aware. For example, HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
The main purpose of Rack awareness is:
Let us also assume that there are 3 physical racks where these machines are placed:
Rack1: DN1;DN2;DN3
Rack2: DN4;DN5;DN6
Rack3: DN7:DN8;DN9
The following diagram depicts an example block placement when HDFS and Yarn are not rack aware:
Hadoop components are rack-aware. For example, HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
The main purpose of Rack awareness is:
- Increasing the availability of data block.
- Better cluster performance.
Let us also assume that there are 3 physical racks where these machines are placed:
Rack1: DN1;DN2;DN3
Rack2: DN4;DN5;DN6
Rack3: DN7:DN8;DN9
The following diagram depicts an example block placement when HDFS and Yarn are not rack aware:
- What happens if Rack1 goes down? -> Potentially data in Block1 might be lost
- Not being Rack aware the entire cluster is thought of placed in default-rack
Hi Admin, I went through your article and it’s totally awesome. You can consider including RSS feed for easy content sharing, So that you can drive huge traffic to your blog. Hadoop Training in Chennai | Big Data Training in Chennai
ReplyDeleteThank You
Delete