Wednesday, February 8, 2017

Calculating the Capacity of a Node

Because YARN has now removed the hard partitioned mapper and reducer slots of Hadoop
Version 1, new capacity calculations are required. There are eight important parameters
for calculating a node’s capacity that are specified in mapred-site.xml and yarnsite.xml:

In mapred-site.xml:
1. mapreduce.map.memory.mb
2. mapreduce.reduce.memory.mb
These are the hard limits enforced by Hadoop on each mapper or reducer task.

1. mapreduce.map.java.opts
2. mapreduce.reduce.java.opts
The heapsize of the jvm –Xmx for the mapper or reducer task. Remember to leave room
for the JVM Perm Gen and Native Libs used. This value should always be lower than
mapreduce.[map|reduce].memory.mb.

In yarn-site.xml:

• yarn.scheduler.minimum-allocation-mb
The smallest container that YARN will allow.

• yarn.scheduler.maximum-allocation-mb
The largest container that YARN will allow.

• yarn.nodemanager.resource.memory-mb
The amount of physical memory (RAM) for Containers on the compute node. It is
important that this is not equal to the total amount of RAM on the node, as other
Hadoop services also require RAM.

• yarn.nodemanager.vmem-pmem-ratio
The amount of virtual memory that each Container is allowed. This can be calculated with:

containerMemoryRequest*vmem-pmem-ratio

As an example, consider a configuration with the settings in the following table.

Example YARN MapReduce Settings
Property



Value


mapreduce.map.memory.mb 
1536


mapreduce.reduce.memory.mb 
2560


mapreduce.map.java.opts 
-Xmx1024m


mapreduce.reduce.java.opts 
-Xmx2048m


yarn.scheduler.minimum-allocation-mb
512


yarn.scheduler.maximum-allocation-mb
4096


yarn.nodemanager.resource.memory-mb 
36864


yarn.nodemanager.vmem-pmem-ratio 
2.1



With these settings, each map and reduce task has a generous 512MB of overhead
for the Container, as evidenced by the difference between the mapreduce.[map|
reduce].memory.mb and the mapreduce.[map|reduce].java.opts.

Next, YARN has been configured to allow a Container no smaller than 512MB and no
larger than 4GB. The compute nodes have 36GB of RAM available for Containers. With
a virtual memory ratio of 2.1 (the default value), each map can have up to 3225.6MB of
RAM, or a reducer can have 5376MB of virtual RAM.

This means that the compute node configured for 36GB of Container space can support
up to 24 maps or 14 reducers, or any combination of mappers and reducers allowed by the
available resources on the node.

2 comments:

  1. Big data(Hadoop) is mostly using their data analytics check process.The cloud contribution aggressive for Hadoop related map reduce code.Selenium Online Training

    ReplyDelete

Kafka Architecture

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...