Questions tagged [hadoop2]

Hadoop 2 represents the second generation of the popular open source distributed platform Apache Hadoop.

Apache Hadoop 2.x consists of significant improvements over the previous stable release of Hadoop aka Hadoop 1.x. Several major enhancements have been made to both the building blocks of Hadoop viz, HDFS and MapReduce. They are :

HDFS Federation : In order to scale the name service horizontally, federation uses multiple independent Namenodes/Namespaces.
MapReduce NextGen aka YARN aka MRv2 : The new architecture divides the two major functions of the JobTracker, resource management and job life-cycle management, into separate components. The new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric.

For more info on Hadoop 2 the official Hadoop 2 homepage can be visited.

2047 questions

318

votes

24 answers

Hadoop "Unable to load native-hadoop library for your platform" warning

I'm currently configuring hadoop on a server running CentOs. When I run start-dfs.sh or stop-dfs.sh, I get the following error: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where…

asked Nov 13 '13 at 01:53

Olshansky

5,904
8
32
47

votes

2 answers

Spark Unable to load native-hadoop library for your platform

I'm a dummy on Ubuntu 16.04, desperately attempting to make Spark work. I've tried to fix my problem using the answers found here on stackoverflow but I couldn't resolve anything. Launching spark with the command ./spark-shell from bin folder I get…

hadoop apache-spark hadoop2

asked Oct 13 '16 at 08:00

cane_mastino

votes

11 answers

Datanode not starts correctly

I am trying to install Hadoop 2.2.0 in pseudo-distributed mode. While I am trying to start the datanode services it is showing the following error, can anyone please tell how to resolve this? **2**014-03-11 08:48:15,916 INFO…

hadoop hadoop2

asked Mar 11 '14 at 03:58

user2631600

votes

14 answers

There are 0 datanode(s) running and no node(s) are excluded in this operation

I have set up a multi node Hadoop Cluster. The NameNode and Secondary namenode runs on the same machine and the cluster has only one Datanode. All the nodes are configured on Amazon EC2 machines. Following are the configuration files on the master…

ubuntu hadoop amazon-ec2 hdfs hadoop2

asked Oct 24 '14 at 09:47

Learner

votes

5 answers

Permission Denied error while running start-dfs.sh

I am getting this error while performing start-dfs.sh Starting namenodes on [localhost] pdsh@Gaurav: localhost: rcmd: socket: Permission denied Starting datanodes pdsh@Gaurav: localhost: rcmd: socket: Permission denied Starting secondary namenodes…

sockets hadoop hdfs hadoop-yarn hadoop2

asked Mar 13 '17 at 04:18

Gaurav A Dubey

votes

5 answers

How can I access S3/S3n from a local Hadoop 2.6 installation?

I am trying to reproduce an Amazon EMR cluster on my local machine. For that purpose, I have installed the latest stable version of Hadoop as of now - 2.6.0. Now I would like to access an S3 bucket, as I do inside the EMR cluster. I have added the…

hadoop amazon-web-services amazon-s3 hadoop-yarn hadoop2

asked Jan 19 '15 at 16:23

doublebyte

1,225
3
13
22

votes

4 answers

Amazon Emr - What is the need of Task nodes when we have Core nodes?

I am learning about Amazon EMR lately, and according to my knowledge the EMR cluster lets us choose 3 nodes. Master which runs the Primary Hadoop daemons like NameNode,Job Tracker and Resource manager. Core which runs Datanode and Tasktracker…

hadoop hadoop2 amazon-emr

asked Jan 07 '17 at 08:23

Taher Koitawala

votes

3 answers

Hadoop namenode : Single point of failure

The Namenode in the Hadoop architecture is a single point of failure. How do people who have large Hadoop clusters cope with this problem?. Is there an industry-accepted solution that has worked well wherein a secondary Namenode takes over in case…

hadoop mapreduce hdfs hadoop-yarn hadoop2

asked Dec 21 '10 at 17:46

rakeshr

1,027
3
17
25

votes

7 answers

How to specify AWS Access Key ID and Secret Access Key as part of a amazon s3n URL

I am passing input and output folders as parameters to mapreduce word count program from webpage. Getting below error: HTTP Status 500 - Request processing failed; nested exception is java.lang.IllegalArgumentException: AWS Access Key ID and…

hadoop amazon-web-services amazon-s3 mapreduce hadoop2

asked Jul 24 '14 at 03:48

user3795951

votes

3 answers

How to fix Hadoop WARNING: An illegal reflective access operation has occurred error on Ubuntu

I have installed Java openjdk version "10.0.2"and Hadoop 2.9.0 successfully. All processes are running well hadoopusr@amalendu:~$ jps 19888 NameNode 20388 DataNode 20898 NodeManager 20343 SecondaryNameNode 20539 ResourceManager 21118 Jps But when…

java ubuntu hadoop hadoop2

asked Sep 03 '18 at 19:02

Amalendu Kar

votes

11 answers

DataNode is Not Starting in singlenode hadoop 2.6.0

I installed hadoop 2.6.0 in my laptop running Ubuntu 14.04LTS. I successfully started the hadoop daemons by running start-all.sh and I run a WourdCount example successfully, then I tried to run a jar example that didn't work with me so I decide to…

hadoop hadoop2 hadoop-plugins

asked Mar 20 '15 at 12:45

Firas M. Awaysheh

votes

2 answers

How to tune spark job on EMR to write huge data quickly on S3

I have a spark job where i am doing outer join between two data frames . Size of first data frame is 260 GB,file format is text files which is split into 2200 files and the size of second data frame is 2GB . Then writing data frame output which is…

apache-spark-sql hadoop2 amazon-emr

asked Oct 15 '17 at 11:16

Sudarshan kumar

1,503
4
36
83

votes

6 answers

Hadoop release missing /conf directory

I am trying to install a single node setup of Hadoop on Ubuntu. I started following the instructions on the Hadoop 2.3 docs. But I seem to be missing something very simple. First, it says to To get a Hadoop distribution, download a recent stable…

hadoop hadoop2

asked Mar 19 '14 at 04:46

Sanketh Katta

5,961
2
29
30

votes

7 answers

name node Vs secondary name node

Hadoop is Consistent and partition tolerant, i.e. It falls under the CP category of the CAP theoram. Hadoop is not available because all the nodes are dependent on the name node. If the name node falls the cluster goes down. But considering the fact…

hadoop hdfs hadoop2 high-availability

asked Nov 14 '13 at 05:47

Sam

2,545
8
38
59

votes

1 answer

How does Hadoop Namenode failover process works?

Hadoop defintive guide says - Each Namenode runs a lightweight failover controller process whose job it is to monitor its Namenode for failures (using a simple heartbeat mechanism) and trigger a failover should a namenode fail. How come a…

hadoop hdfs hadoop2 failover namenode

asked Oct 23 '15 at 21:21

K246

1,077
1
8
14

2 3

…

99 100 Next