Select Hadoop: Resource Manager or Node Manager: Fails to Start or Crashes -Server Side Issue

Symptoms may include:
• Process appears to start, but then disappears from the process list.
• Node Manager cannot bind to interface.
• Kernel Panic, Halt.

Potential Root Cause: Existing				Troubleshooting Steps:
Process Bound to Port				• Examine bound ports to ensure no other process has
				already bound.

				Resolution Steps:
				• Resolve the port conflict before attempting to restart
				the Resource Manager/Node Manager.

				Information to Collect:
				• List of bound interfaces/ports and the process.
				• Resource Manager log

Potential Root Cause: Incorrect				Troubleshooting Steps:
File Permissions				• Verify that all Hadoop file system permissions are set
				properly.
				• Verify the Hadoop configurations.

				Resolution Steps:
				• Follow the procedures for handling failure due to file
				permissions (see Hortonworks KB Solutions/Articles).
				• Fix any incorrect configuration.

				Information to Collect:
				• Dump of file system permissions, ownership,
				and flags -- by looking in the configuration
				value in the yarn-site.xml file for the
				yarn.nodemanager.local-dirs property. In this
				case, it has a value of “/hadoop/yarn/local”. From the
				command line, run:
				ls -lR /hadoop/yarn/local
				• Resource Manager log.
				• Node Manager log.

Potential Root Cause: Incorrect				Troubleshooting Steps:
Name-to-IP Resolution				• Verify that the name/IP resolution is correct for all
				nodes in the cluster.

				Resolution Steps:
				• Fix any incorrect configuration.
				Information to Collect:
				• Local hosts file for all hosts on the system (/etc/
				hosts).
				• Resolver configuration (/etc/resolv.conf).
				• Network configuration (/etc/sysconfig/
				network-scripts/ifcfg-ethX where X =
				number of interface card).

Potential Root Cause: Java				Troubleshooting Steps:
Heap Space Too Low				• Examine the heap space property in yarn-env.sh
				• Examine the settings in Ambari cluster management.

				Resolution Steps:
				• Adjust the heap space property until the Resource
				Manager resumes running.
				Information to Collect:
				• yarn-env.sh from cluster.
				• Screen-shot of Ambari cluster management mapred
				settings screen.
				• Resource Manager log.
				• Node Manager log.

Potential Root Cause:				Troubleshooting Steps:
Permissions Not Set Correctly				• Examine the permissions on the various directories on
on Local File System				the local file system.
				• Verify proper ownership (yarn/mapred for
				MapReduce directories and hdfs for HDFS
				directories).

				Resolution Steps:
				• Use the chmod command to change the permissions
				of the directories to 755.
				• Use the chown command to assign the directories to
				the correct owner (hdfs or yarn/mapred).
				• Relaunch the Hadoop daemons using the correct
				user.

				Information to Collect:
				• core-site.xml, hdfs-site.xml, mapredsite.
				xml, yarn-site.xml
				• Permissions listing for the directories listed in the
				above configuration files.

Potential Root Cause:				Troubleshooting Steps:
Insufficient Disk Space				• Verify that there is sufficient space on all system, log,
				and HDFS partitions.
				• Run the df -k command on the Name/DataNodes
				to verify that there is sufficient capacity on the disk
				volumes used for storing NameNode or HDFS data.

				Resolution Steps:
				• Free up disk space on all nodes in the cluster.
				-OR-
				• Add additional capacity.
				Information to Collect:
				• Core dumps.
				• Linux command: last (history).
				• Dump of file system information.
				• Output of df -k command.

Potential Root Cause: Reserved				Troubleshooting Steps:
Disk Space is Set Higher than				• In hdfs-site.xml, check that the value of the
Free Space				dfs.datanode.du.reserved property is less than
				the available free space on the drive.

				Resolution Steps:
				• Configure an appropriate value, or increase free
				space.

				Information to Collect:
				• HDFS configuration files.

Select Hadoop

Monday, February 6, 2017

Resource Manager or Node Manager: Fails to Start or Crashes -Server Side Issue

No comments:

Post a Comment

Kafka Architecture

Search This Blog