| Symptoms may include: | |||||||||
| • Process appears to start, but then disappears from the process list. | |||||||||
| • Node Manager cannot bind to interface. | |||||||||
| • Kernel Panic, Halt. | |||||||||
| Potential Root Cause: Existing | Troubleshooting Steps: | ||||||||
| Process Bound to Port | • Examine bound ports to ensure no other process has | ||||||||
| already bound. | |||||||||
| Resolution Steps: | |||||||||
| • Resolve the port conflict before attempting to restart | |||||||||
| the Resource Manager/Node Manager. | |||||||||
| Information to Collect: | |||||||||
| • List of bound interfaces/ports and the process. | |||||||||
| • Resource Manager log | |||||||||
| Potential Root Cause: Incorrect | Troubleshooting Steps: | ||||||||
| File Permissions | • Verify that all Hadoop file system permissions are set | ||||||||
| properly. | |||||||||
| • Verify the Hadoop configurations. | |||||||||
| Resolution Steps: | |||||||||
| • Follow the procedures for handling failure due to file | |||||||||
| permissions (see Hortonworks KB Solutions/Articles). | |||||||||
| • Fix any incorrect configuration. | |||||||||
| Information to Collect: | |||||||||
| • Dump of file system permissions, ownership, | |||||||||
| and flags -- by looking in the configuration | |||||||||
| value in the yarn-site.xml file for the | |||||||||
| yarn.nodemanager.local-dirs property. In this | |||||||||
| case, it has a value of “/hadoop/yarn/local”. From the | |||||||||
| command line, run: | |||||||||
| ls -lR /hadoop/yarn/local | |||||||||
| • Resource Manager log. | |||||||||
| • Node Manager log. | |||||||||
| Potential Root Cause: Incorrect | Troubleshooting Steps: | ||||||||
| Name-to-IP Resolution | • Verify that the name/IP resolution is correct for all | ||||||||
| nodes in the cluster. | |||||||||
| Resolution Steps: | |||||||||
| • Fix any incorrect configuration. | |||||||||
| Information to Collect: | |||||||||
| • Local hosts file for all hosts on the system (/etc/ | |||||||||
| hosts). | |||||||||
| • Resolver configuration (/etc/resolv.conf). | |||||||||
| • Network configuration (/etc/sysconfig/ | |||||||||
| network-scripts/ifcfg-ethX where X = | |||||||||
| number of interface card). | |||||||||
| Potential Root Cause: Java | Troubleshooting Steps: | ||||||||
| Heap Space Too Low | • Examine the heap space property in yarn-env.sh | ||||||||
| • Examine the settings in Ambari cluster management. | |||||||||
| Resolution Steps: | |||||||||
| • Adjust the heap space property until the Resource | |||||||||
| Manager resumes running. | |||||||||
| Information to Collect: | |||||||||
| • yarn-env.sh from cluster. | |||||||||
| • Screen-shot of Ambari cluster management mapred | |||||||||
| settings screen. | |||||||||
| • Resource Manager log. | |||||||||
| • Node Manager log. | |||||||||
| Potential Root Cause: | Troubleshooting Steps: | ||||||||
| Permissions Not Set Correctly | • Examine the permissions on the various directories on | ||||||||
| on Local File System | the local file system. | ||||||||
| • Verify proper ownership (yarn/mapred for | |||||||||
| MapReduce directories and hdfs for HDFS | |||||||||
| directories). | |||||||||
| Resolution Steps: | |||||||||
| • Use the chmod command to change the permissions | |||||||||
| of the directories to 755. | |||||||||
| • Use the chown command to assign the directories to | |||||||||
| the correct owner (hdfs or yarn/mapred). | |||||||||
| • Relaunch the Hadoop daemons using the correct | |||||||||
| user. | |||||||||
| Information to Collect: | |||||||||
| • core-site.xml, hdfs-site.xml, mapredsite. | |||||||||
| xml, yarn-site.xml | |||||||||
| • Permissions listing for the directories listed in the | |||||||||
| above configuration files. | |||||||||
| Potential Root Cause: | Troubleshooting Steps: | ||||||||
| Insufficient Disk Space | • Verify that there is sufficient space on all system, log, | ||||||||
| and HDFS partitions. | |||||||||
| • Run the df -k command on the Name/DataNodes | |||||||||
| to verify that there is sufficient capacity on the disk | |||||||||
| volumes used for storing NameNode or HDFS data. | |||||||||
| Resolution Steps: | |||||||||
| • Free up disk space on all nodes in the cluster. | |||||||||
| -OR- | |||||||||
| • Add additional capacity. | |||||||||
| Information to Collect: | |||||||||
| • Core dumps. | |||||||||
| • Linux command: last (history). | |||||||||
| • Dump of file system information. | |||||||||
| • Output of df -k command. | |||||||||
| Potential Root Cause: Reserved | Troubleshooting Steps: | ||||||||
| Disk Space is Set Higher than | • In hdfs-site.xml, check that the value of the | ||||||||
| Free Space | dfs.datanode.du.reserved property is less than | ||||||||
| the available free space on the drive. | |||||||||
| Resolution Steps: | |||||||||
| • Configure an appropriate value, or increase free | |||||||||
| space. | |||||||||
| Information to Collect: | |||||||||
| • HDFS configuration files. | |||||||||
Monday, February 6, 2017
Resource Manager or Node Manager: Fails to Start or Crashes -Server Side Issue
Subscribe to:
Post Comments (Atom)
Kafka Architecture
Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...
-
1. Can you describe your Hadoop journey and current profile and roles and responsibility. 2. What is NameNode Heap memory and how we ca...
-
Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...
-
HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked int...
No comments:
Post a Comment