Select Hadoop: Hadoop Admin Interview Question 1

Saturday, July 16, 2016

Hadoop Admin Interview Question 1

Hadoop Admin Interview Question

1: Can you describe about your current roles and responsibility or day to day activity.
2: Please describe the YARN Architecture.
3: What is Name Node Heap memory and how we can configure heap memory.
4: What do you install Hadoop cluster, please describe in detail, what are the service and component you install during Hadoop installation
5: How do you enable repository during installation and what details you provide there
6: What is Meta store and how do you connect it
7: if hive Meta store service is down, then what will be impact on Hadoop cluster.
8: Do we install Hive service on every nodes in Hadoop Cluster?
9: What is beeline?
10: What is hiveserver2?
11: How do you connect hive through beeline
12: What is thrift client?
13: What is job tracker and Resource Manager?
14: What is the use of ZooKeeper services and why we need it?
15: How do you troubleshoot if Name Node is down in Hadoop version 1 and also on Hadoop Version 2?
16: How do you troubleshoot if some services are down in Hadoop cluster?
17: How do you troubleshoot slow running job.
18: What are the benefit of using YARN?
19: Is it possible to run MRV1 and MRV2 run on single cluster?
20: What is FIFO scheduler?
21: what is Capacity scheduler
22: Difference between FIFO and Capacity scheduler
23: How do you executer job on cluster using FIFO scheduler
24: How do you identify a long running job in a large busy cluster?
25: How do you kill Hadoop job, if the cluster is configured with capacity scheduler.
26: What is Kerberos realm, how do you define it.
27: How do you define and create a Kerberos principle
28: How do you add new user in Hadoop cluster.
29: How do you define permissions to user for particular directory in Hadoop Cluster?
30: How do we decide the heap memory limit for a Hadoop?
31: How do we decide the heap memory limit for Name Node?
32: How do you increase the Name node heap memory?
33: What is Standby Name Node and what is High availability Hadoop cluster?
34: How do you resolve connectivity issue of Active Name Node and Standby Name node and what will be the impact on Hadoop cluster and will the standby Name Node try to become active.
35: Few Data node is running slow. What will be the impact on the job which is running on those data node and what will be the impact on overall cluster performance.
36: What is the difference between dead node and blacklist node and how node becomes blacklist node?
37: How Name Node decide which Node is dead.
38: What is speculative execution? What it does?
39: How do you schedule jobs in Hadoop cluster?
40: Which version of MapReduce you are using.
41: Difference between MapReduce version one and MapReduce version two.
42: How do you identify a long running job and how do you troubleshoot that
43: How do you kill job.
44: How do you add a service or install a component in existing Hadoop cluster.
45: How do you restart the Name Node?
46: How do you add or remove data node in Hadoop cluster, what are the steps and what files you edit for it.
47: What is Hive and what are the work you have done on Hive.
48: What is Oozie and how do you use in it.
49: What are the schedulers available in Hadoop?
50: When you submit a spark job in Hadoop 2.x. how spark interact with YARN, how resources are negotiated with SPARK in YARN.
51: What is spark context? What is the use of it?
52: Why spark job can run only in Hadoop 2.x not in 1.x
53: What is default YARN scheduler?
54: How jobs are gets scheduled in YARN. Which component is responsible for it? How container do the resource allocation in YARN
55: If you submit a SPARK job in Hadoop cluster, how container do the resource negotiation for SPARK job
56: How do you troubleshoot if data node is down, what are the logs file you check.
57: How do you increase storage capacity of Hadoop Cluster?
58: What happens after adding new data node in Hadoop cluster?
59: What is balancer, how do you schedule it.
60: You try to login on a machine of your cluster and you are getting timeout exception. What could be the issue for it? What will be your steps to resolve it?
61: How do you start the process in Linux?
62: in which case speculative exception in not beneficial
63: When we run a MapReduce job, what are the process involved in Mapper side? Before going to reducer?