*Job Fails to Start | ||||||||||||
Symptom: Exception When | Troubleshooting Steps: | |||||||||||
Job Submitted, Potential Root | • Examine Node Manager/Resource Manager logs and | |||||||||||
task-logs to find the exact exception. | ||||||||||||
Cause: Mistake in the Job's | Resolution Steps: | |||||||||||
User Code | • Examine the stack-trace for the thrown exception. | |||||||||||
• Examine the user code to see if you can spot the | ||||||||||||
error. | ||||||||||||
Information to Collect: | ||||||||||||
• Resource Manager log. | ||||||||||||
• Node Manager log. | ||||||||||||
• The exception trace that the user has mentioned, and | ||||||||||||
the task logs. | ||||||||||||
• If possible, get at least a snippet of Java code from | ||||||||||||
the area where the exception was thrown. | ||||||||||||
Symptom: "No Class Def Found" | Troubleshooting Steps: | |||||||||||
or Similar Exception When | • Verify that the exception is ClassNotFound, | |||||||||||
Trying to Start Job, Potential | NoSuchMethodError, or a similar exception. | |||||||||||
Root Cause 1: Job's .jar File | Resolution Steps: | |||||||||||
-- or Other .jar File -- Not on | • Find the .jar file that contains the missing class and | |||||||||||
Classpath | add it to the classpath. | |||||||||||
Information to Collect: | ||||||||||||
• The entire command used to submit the job. | ||||||||||||
• The stack-trace from the Node Manager logs. | ||||||||||||
Potential Root Cause 2: Main | Troubleshooting Steps: | |||||||||||
Class or Method of the Job | • Examine the code for the main MRv2 class. | |||||||||||
Code is not "Public Static" | ||||||||||||
Resolution Steps: | ||||||||||||
• Set access modifiers to "public static" | ||||||||||||
• Recompile and re-test. | ||||||||||||
Information to Collect: | ||||||||||||
• The exact exception thrown by Hadoop. | ||||||||||||
• The job source code. | ||||||||||||
Job Seems to Hang in Setup | ||||||||||||
Symptom: Job Seems to Hang | Troubleshooting Steps: | |||||||||||
and Node Manager Becomes | • Verify the amount of system memory. | |||||||||||
Blacklisted, Potential Root | • Calculate the required memory for each configured | |||||||||||
Cause: Too Many Allowed | Container. | |||||||||||
Slots Configured for the System | • Take into account any other processes running on the | |||||||||||
Memory on the Node | node. | |||||||||||
Resolution Steps: | ||||||||||||
• Add all of the above. If the total is greater than the | ||||||||||||
total available on the node, you will need to reduce | ||||||||||||
the amount configured in the Container properties. | ||||||||||||
Symptom: Job Seems to | Troubleshooting Steps: | |||||||||||
Hang Without "Blacklisting", | • Verify the number of available MRv2 tasks available | |||||||||||
Potential Root Cause: No Node | by looking at: | |||||||||||
Managers Currently Available | <Resource Manager host>:8088/cluster/nodes | |||||||||||
Resolution Steps: | ||||||||||||
• Wait until more Node Managers become available, | ||||||||||||
then see if the job runs. | ||||||||||||
Information to Collect: | ||||||||||||
• None until the job actually fails to run, then | ||||||||||||
troubleshoot based on the failure symptom. |
Tuesday, February 7, 2017
Common Client-Side Issues
Subscribe to:
Post Comments (Atom)
Kafka Architecture
Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...
-
HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked int...
-
1. Can you describe your Hadoop journey and current profile and roles and responsibility. 2. What is NameNode Heap memory and how we ca...
-
1. Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals an...
After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog Big Data Hadoop Online course
ReplyDeleteThank You
Delete