Monday, January 22, 2018

General HBase Tuning

When you tune HBase, you can improve the performance and balance the memory usage.

Updating environment variables (hbase-env.sh)

Depending on the availability of memory on the cluster nodes, you can use environment variables to tune the memory that is available to the HBase master server and the HBase region servers. You can also configure the garbage collector. As part of the HBase tuning process, consider the map reduce workload and the memory that is allocated to the map reduce JVMs.

The environment variables that help you control performance in HBase are in file $BIGINSIGHTS_HOME/hdm/components/hbase/conf/hbase-env.sh. After any changes to this file, run the following command to synchronize the HBase configuration across all nodes of the cluster:
$BIGINSIGHTS_HOME/bin/syncconf.sh hbase

Then stop and restart HBase with the following commands:

$BIGINSIGHTS_HOME/bin/stop.sh hbase
$BIGINSIGHTS_HOME/bin/start.sh hbase

Master and region server memory

Each region server contains regions that contain all of the data in a key range.

The HBASE_HEAPSIZE value is the maximum amount of heap to use, in MB. The default is 1000. 
This is small for an HBase system that is used regularly in your cluster. Give HBase as much memory as you can to avoid swapping to achieve good performance. The example uses a value of 8000, but you should tune the size based on your environment and workloads.

You can increase the HBase master server JVM heap size with the following command:

export HBASE_HEAPSIZE=8000

Then, increase the JVM heap size for the region servers with this command:

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xms8G -Xmx8g"

Garbage collection

HBase uses the JVM garbage collection subsystem, which reduces some memory management issues. Garbage collection is an automated system that handles both the allocation and reclamation of memory for Java objects.

For a JVM that contains less that 4 GB of memory, use gcpolicy=gencon. A suggested best practice is the following setting:
-Xms3000m -Xmx3000m -Xgcpolicy:gencon 

The -Xms<size> sets the initial size of the heap. The -Xmx<size> sets the maximum size of the heap.
For a JVM that contains more than 4 GB or memory, use policy=balanced. With this policy, you do not need to set anything beyond the initial size and the maximum size of the heap.

-Xms8192m -Xmx8192m -Xgcpolicy:balanced

You can manipulate the garbage collection options in HBASE_OPTS:
export HBASE_OPTS="$HBASE_OPTS -Xgcthreads2 -Xgcpolicy:gencon -Xalwaysclassgc" 

Updating configuration values (hbase-site.xml)

HBase site-specific customizations are in the file hbase-site.xml. Navigate to the $BIGINSIGHTS_HOME/hdm/components/hbase/conf directory and edit the hbase-site.xml file.

hbase.regionserver.handler.count

This parameter defines the number of threads that are kept open to answer incoming requests to user tables. The default value is 30.

A rule of thumb is to keep the value low when the payload for each request is large, and keep the value high when the payload is small. Increase the hbase.regionserver.handler.count to a value that is approximately the number of CPUs on the region servers.

<property>
  <name>hbase.regionserver.handler.count</name>
  <value>64</value>
</property>

hbase.hregion.max.filesize

This parameter is the maximum HStoreFile size. The default value is 10737418240. Decrease the region server size. Big SQL determines the number of mappers based on the region size. There is one mapper for each region.

<property>     
  <name>hbase.hregion.max.filesize</name>
  <value> 1073741824 </value>   
</property>

hbase.client.write.buffer

This parameter is the size of the HTable client write buffer in bytes. The default value is 2097152.
A bigger buffer takes more memory,on both the client and server side, but a larger buffer size reduces the number of remote procedure calls that are made. Increase the hbase.client.write.buffer value:

<property>     
  <name>hbase.client.write.buffer</name>     
  <value>8388608</value>  
</property>

hbase.client.scanner.caching

This parameter is the number of rows that are fetched when calling next on a scanner, if it is not served from memory. The default value is 100.

A higher caching value enables faster scanners, but uses more memory and some calls of next can take longer times when the cache is empty. Increase the scanner cache size to improve the performance of large reads.

<property>    
  <name>hbase.client.scanner.caching</name>    
  <value>10000</value>  
</property>

3 comments:

  1. awesome post presented by you..your writing style is fabulous and keep update with your blogs Hadoop Administration Online Traininig

    ReplyDelete
  2. Thank You:)
    Also check
    https://selecthadoop.blogspot.in/2018/03/hadoop-admin-interview-question-answer-3.html

    ReplyDelete

Kafka Architecture

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...