| *Job Fails to Start | ||||||||||||
| Symptom: Exception When | Troubleshooting Steps: | |||||||||||
| Job Submitted, Potential Root | • Examine Node Manager/Resource Manager logs and | |||||||||||
| task-logs to find the exact exception. | ||||||||||||
| Cause: Mistake in the Job's | Resolution Steps: | |||||||||||
| User Code | • Examine the stack-trace for the thrown exception. | |||||||||||
| • Examine the user code to see if you can spot the | ||||||||||||
| error. | ||||||||||||
| Information to Collect: | ||||||||||||
| • Resource Manager log. | ||||||||||||
| • Node Manager log. | ||||||||||||
| • The exception trace that the user has mentioned, and | ||||||||||||
| the task logs. | ||||||||||||
| • If possible, get at least a snippet of Java code from | ||||||||||||
| the area where the exception was thrown. | ||||||||||||
| Symptom: "No Class Def Found" | Troubleshooting Steps: | |||||||||||
| or Similar Exception When | • Verify that the exception is ClassNotFound, | |||||||||||
| Trying to Start Job, Potential | NoSuchMethodError, or a similar exception. | |||||||||||
| Root Cause 1: Job's .jar File | Resolution Steps: | |||||||||||
| -- or Other .jar File -- Not on | • Find the .jar file that contains the missing class and | |||||||||||
| Classpath | add it to the classpath. | |||||||||||
| Information to Collect: | ||||||||||||
| • The entire command used to submit the job. | ||||||||||||
| • The stack-trace from the Node Manager logs. | ||||||||||||
| Potential Root Cause 2: Main | Troubleshooting Steps: | |||||||||||
| Class or Method of the Job | • Examine the code for the main MRv2 class. | |||||||||||
| Code is not "Public Static" | ||||||||||||
| Resolution Steps: | ||||||||||||
| • Set access modifiers to "public static" | ||||||||||||
| • Recompile and re-test. | ||||||||||||
| Information to Collect: | ||||||||||||
| • The exact exception thrown by Hadoop. | ||||||||||||
| • The job source code. | ||||||||||||
| Job Seems to Hang in Setup | ||||||||||||
| Symptom: Job Seems to Hang | Troubleshooting Steps: | |||||||||||
| and Node Manager Becomes | • Verify the amount of system memory. | |||||||||||
| Blacklisted, Potential Root | • Calculate the required memory for each configured | |||||||||||
| Cause: Too Many Allowed | Container. | |||||||||||
| Slots Configured for the System | • Take into account any other processes running on the | |||||||||||
| Memory on the Node | node. | |||||||||||
| Resolution Steps: | ||||||||||||
| • Add all of the above. If the total is greater than the | ||||||||||||
| total available on the node, you will need to reduce | ||||||||||||
| the amount configured in the Container properties. | ||||||||||||
| Symptom: Job Seems to | Troubleshooting Steps: | |||||||||||
| Hang Without "Blacklisting", | • Verify the number of available MRv2 tasks available | |||||||||||
| Potential Root Cause: No Node | by looking at: | |||||||||||
| Managers Currently Available | <Resource Manager host>:8088/cluster/nodes | |||||||||||
| Resolution Steps: | ||||||||||||
| • Wait until more Node Managers become available, | ||||||||||||
| then see if the job runs. | ||||||||||||
| Information to Collect: | ||||||||||||
| • None until the job actually fails to run, then | ||||||||||||
| troubleshoot based on the failure symptom. | ||||||||||||
Tuesday, February 7, 2017
Common Client-Side Issues
Subscribe to:
Post Comments (Atom)
Kafka Architecture
Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...
-
1. Can you describe your Hadoop journey and current profile and roles and responsibility. 2. What is NameNode Heap memory and how we ca...
-
Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...
-
HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked int...
After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog Big Data Hadoop Online course
ReplyDeleteThank You
Delete