Symptoms may include: | |||||||||
• Process appears to start, but then disappears from the process list. | |||||||||
• Node Manager cannot bind to interface. | |||||||||
• Kernel Panic, Halt. | |||||||||
Potential Root Cause: Existing | Troubleshooting Steps: | ||||||||
Process Bound to Port | • Examine bound ports to ensure no other process has | ||||||||
already bound. | |||||||||
Resolution Steps: | |||||||||
• Resolve the port conflict before attempting to restart | |||||||||
the Resource Manager/Node Manager. | |||||||||
Information to Collect: | |||||||||
• List of bound interfaces/ports and the process. | |||||||||
• Resource Manager log | |||||||||
Potential Root Cause: Incorrect | Troubleshooting Steps: | ||||||||
File Permissions | • Verify that all Hadoop file system permissions are set | ||||||||
properly. | |||||||||
• Verify the Hadoop configurations. | |||||||||
Resolution Steps: | |||||||||
• Follow the procedures for handling failure due to file | |||||||||
permissions (see Hortonworks KB Solutions/Articles). | |||||||||
• Fix any incorrect configuration. | |||||||||
Information to Collect: | |||||||||
• Dump of file system permissions, ownership, | |||||||||
and flags -- by looking in the configuration | |||||||||
value in the yarn-site.xml file for the | |||||||||
yarn.nodemanager.local-dirs property. In this | |||||||||
case, it has a value of “/hadoop/yarn/local”. From the | |||||||||
command line, run: | |||||||||
ls -lR /hadoop/yarn/local | |||||||||
• Resource Manager log. | |||||||||
• Node Manager log. | |||||||||
Potential Root Cause: Incorrect | Troubleshooting Steps: | ||||||||
Name-to-IP Resolution | • Verify that the name/IP resolution is correct for all | ||||||||
nodes in the cluster. | |||||||||
Resolution Steps: | |||||||||
• Fix any incorrect configuration. | |||||||||
Information to Collect: | |||||||||
• Local hosts file for all hosts on the system (/etc/ | |||||||||
hosts). | |||||||||
• Resolver configuration (/etc/resolv.conf). | |||||||||
• Network configuration (/etc/sysconfig/ | |||||||||
network-scripts/ifcfg-ethX where X = | |||||||||
number of interface card). | |||||||||
Potential Root Cause: Java | Troubleshooting Steps: | ||||||||
Heap Space Too Low | • Examine the heap space property in yarn-env.sh | ||||||||
• Examine the settings in Ambari cluster management. | |||||||||
Resolution Steps: | |||||||||
• Adjust the heap space property until the Resource | |||||||||
Manager resumes running. | |||||||||
Information to Collect: | |||||||||
• yarn-env.sh from cluster. | |||||||||
• Screen-shot of Ambari cluster management mapred | |||||||||
settings screen. | |||||||||
• Resource Manager log. | |||||||||
• Node Manager log. | |||||||||
Potential Root Cause: | Troubleshooting Steps: | ||||||||
Permissions Not Set Correctly | • Examine the permissions on the various directories on | ||||||||
on Local File System | the local file system. | ||||||||
• Verify proper ownership (yarn/mapred for | |||||||||
MapReduce directories and hdfs for HDFS | |||||||||
directories). | |||||||||
Resolution Steps: | |||||||||
• Use the chmod command to change the permissions | |||||||||
of the directories to 755. | |||||||||
• Use the chown command to assign the directories to | |||||||||
the correct owner (hdfs or yarn/mapred). | |||||||||
• Relaunch the Hadoop daemons using the correct | |||||||||
user. | |||||||||
Information to Collect: | |||||||||
• core-site.xml, hdfs-site.xml, mapredsite. | |||||||||
xml, yarn-site.xml | |||||||||
• Permissions listing for the directories listed in the | |||||||||
above configuration files. | |||||||||
Potential Root Cause: | Troubleshooting Steps: | ||||||||
Insufficient Disk Space | • Verify that there is sufficient space on all system, log, | ||||||||
and HDFS partitions. | |||||||||
• Run the df -k command on the Name/DataNodes | |||||||||
to verify that there is sufficient capacity on the disk | |||||||||
volumes used for storing NameNode or HDFS data. | |||||||||
Resolution Steps: | |||||||||
• Free up disk space on all nodes in the cluster. | |||||||||
-OR- | |||||||||
• Add additional capacity. | |||||||||
Information to Collect: | |||||||||
• Core dumps. | |||||||||
• Linux command: last (history). | |||||||||
• Dump of file system information. | |||||||||
• Output of df -k command. | |||||||||
Potential Root Cause: Reserved | Troubleshooting Steps: | ||||||||
Disk Space is Set Higher than | • In hdfs-site.xml, check that the value of the | ||||||||
Free Space | dfs.datanode.du.reserved property is less than | ||||||||
the available free space on the drive. | |||||||||
Resolution Steps: | |||||||||
• Configure an appropriate value, or increase free | |||||||||
space. | |||||||||
Information to Collect: | |||||||||
• HDFS configuration files. |
Monday, February 6, 2017
Resource Manager or Node Manager: Fails to Start or Crashes -Server Side Issue
Subscribe to:
Post Comments (Atom)
Kafka Architecture
Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...
-
HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked int...
-
1. Can you describe your Hadoop journey and current profile and roles and responsibility. 2. What is NameNode Heap memory and how we ca...
-
1. Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals an...
No comments:
Post a Comment