Select Hadoop: Common Client-Side Issues

*Job Fails to Start

Symptom: Exception When

Troubleshooting Steps:

Job Submitted, Potential Root

• Examine Node Manager/Resource Manager logs and

task-logs to find the exact exception.

Cause: Mistake in the Job's

Resolution Steps:

User Code

• Examine the stack-trace for the thrown exception.

• Examine the user code to see if you can spot the

error.

Information to Collect:

• Resource Manager log.

• Node Manager log.

• The exception trace that the user has mentioned, and

the task logs.

• If possible, get at least a snippet of Java code from

the area where the exception was thrown.

Symptom: "No Class Def Found"

Troubleshooting Steps:

or Similar Exception When

• Verify that the exception is ClassNotFound,

Trying to Start Job, Potential

NoSuchMethodError, or a similar exception.

Root Cause 1: Job's .jar File

Resolution Steps:

-- or Other .jar File -- Not on

• Find the .jar file that contains the missing class and

Classpath

add it to the classpath.

Information to Collect:

• The entire command used to submit the job.

• The stack-trace from the Node Manager logs.

Potential Root Cause 2: Main

Troubleshooting Steps:

Class or Method of the Job

• Examine the code for the main MRv2 class.

Code is not "Public Static"

Resolution Steps:

• Set access modifiers to "public static"

• Recompile and re-test.

Information to Collect:

• The exact exception thrown by Hadoop.

• The job source code.

Job Seems to Hang in Setup

Symptom: Job Seems to Hang

Troubleshooting Steps:

and Node Manager Becomes

• Verify the amount of system memory.

Blacklisted, Potential Root

• Calculate the required memory for each configured

Cause: Too Many Allowed

Container.

Slots Configured for the System

• Take into account any other processes running on the

Memory on the Node

node.

Resolution Steps:

• Add all of the above. If the total is greater than the

total available on the node, you will need to reduce

the amount configured in the Container properties.

Symptom: Job Seems to

Troubleshooting Steps:

Hang Without "Blacklisting",

• Verify the number of available MRv2 tasks available

Potential Root Cause: No Node

by looking at:

Managers Currently Available

<Resource Manager host>:8088/cluster/nodes

Resolution Steps:

• Wait until more Node Managers become available,

then see if the job runs.

Information to Collect:

• None until the job actually fails to run, then

troubleshoot based on the failure symptom.

Select Hadoop

Tuesday, February 7, 2017

Common Client-Side Issues

2 comments:

Kafka Architecture

Search This Blog