Tuesday, February 7, 2017

Common Client-Side Issues


*Job Fails to Start

















Symptom: Exception When
Troubleshooting Steps:


Job Submitted, Potential Root
• Examine Node Manager/Resource Manager logs and




task-logs to find the exact exception.











Cause: Mistake in the Job's
Resolution Steps:



User Code

• Examine the stack-trace for the thrown exception.




• Examine the user code to see if you can spot the




error.


















Information to Collect:






• Resource Manager log.






• Node Manager log.






• The exception trace that the user has mentioned, and




the task logs.







• If possible, get at least a snippet of Java code from




the area where the exception was thrown.










Symptom: "No Class Def Found" Troubleshooting Steps:


or Similar Exception When
• Verify that the exception is ClassNotFound,
Trying to Start Job, Potential
NoSuchMethodError, or a similar exception.










Root Cause 1: Job's .jar File
Resolution Steps:



-- or Other .jar File -- Not on
• Find the .jar file that contains the missing class and
Classpath


add it to the classpath.






Information to Collect:






• The entire command used to submit the job.




• The stack-trace from the Node Manager logs.










Potential Root Cause 2: Main
Troubleshooting Steps:


Class or Method of the Job
• Examine the code for the main MRv2 class.
Code is not "Public Static"




















Resolution Steps:







• Set access modifiers to "public static"





• Recompile and re-test.






Information to Collect:






• The exact exception thrown by Hadoop.




• The job source code.


Job Seems to Hang in Setup
















Symptom: Job Seems to Hang
Troubleshooting Steps:


and Node Manager Becomes
• Verify the amount of system memory.

Blacklisted, Potential Root
• Calculate the required memory for each configured
Cause: Too Many Allowed
Container.



Slots Configured for the System • Take into account any other processes running on the
Memory on the Node
node.








Resolution Steps:







• Add all of the above. If the total is greater than the




total available on the node, you will need to reduce




the amount configured in the Container properties.










Symptom: Job Seems to
Troubleshooting Steps:


Hang Without "Blacklisting",
• Verify the number of available MRv2 tasks available
Potential Root Cause: No Node by looking at:



Managers Currently Available
<Resource Manager host>:8088/cluster/nodes














Resolution Steps:







• Wait until more Node Managers become available,




then see if the job runs.
















Information to Collect:






• None until the job actually fails to run, then




troubleshoot based on the failure symptom.

2 comments:

  1. After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog Big Data Hadoop Online course

    ReplyDelete

Kafka Architecture

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...