4

Some major re-factoring is happening Hadoop around MapReduce. Details about the same can be found in the below JIRA.

https://issues.apache.org/jira/browse/MAPREDUCE-279

It has ResourceManager, NodeManager and HistoryServer daemons. Has anyone tried running them in Eclipse? This would make it easier for development and debugging purposes.

I have sent a mail in the Hadoop forums and no one has tried it out. Just wanted to check if someone has done something similar in stackoverflow.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117

2 Answers2

0

I have try to run YARN (The next generation of mapreduce) on my host for several days.

Firstly, get the source code from apache.org using svn or git. take svn for example:

svn co https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23.0

then, generate eclipse related files using maven (you should configure manve3 on your host before this step.)

mvn test -DskipTests

mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

and now you could import existing maven project into eclipse.(you should configure maven plugin in eclipse first.)

In eclipse: File-> Import existing Maven projects

Choose "Existing Projects into Workspace"
Select the hadoop-mapreduce-project directory as the root directory
Select the hadoop-mapreduce-project project
Click "Finish"

I have try many times due to class_path/build_path was not correctly configured and not include all dependency package/class. Try to "Add External Class Folder" and select the build directory of the current project Under project Properties if you meet the same problem as me.


update:2012-03-15

I could run YARN(the same as Hadoop0.23) in eclipse now.

Firstly, you should compile/build Yarn Successfully by exec command:

mvn clean package -Pdist -Dtar -DskipTests

For the reason that I only care about how to debug YARN, I run HDFS on my single host in the linux terminal,not in eclipse.

bin/hdfs namenode -formate -clusterid your_hdfs_id
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode

and then, import hadoop 0.23 into eclipse and find resourcemanager.java, the next step is to run this class in eclipse. Detail steps:

  • right click, and select run as application
  • add new configuration to run this class, in the arguments part, fill in with content:

    --config your_yarn_conf_dir (the same as HDFS conf dir)

  • click run button, you will find resourcemanager output in eclipse console.

Running Nodemanaer in eclipse is the same as running Resourcemanager. Add new configuration and fill argumemts with "--config your_yarn_conf_dir", then press run button.

Happy Coding~!

nourlcn
  • 103
  • 6
  • I have already setup the 0.23 projects in Eclipse, what I am interested in is starting and debugging the daemons in Eclipse. – Praveen Sripati Dec 24 '11 at 02:51
  • Sorry, I have never run yarn daemon in eclipse except using command in console. I also want to know how to debug yarn step by step in eclipse. I will try to get it by analysis yarn-daemons.sh file these days. @PraveenSripati – nourlcn Dec 26 '11 at 08:33
0

Nourl Wait for https://issues.apache.org/jira/browse/MAPREDUCE-3131 to complete. Any way you can check out the revision and try running that.

You will need to mvn site:site to generate a document, Which has all the documentations. And inorder to figure out how? you can either open scripts debug.sh and see for yourself.

Basically we are passing the JAVA_OPTIONS and specifying eclipse remote debug parameters. It gets tricky for child processes, as for that one needs to specify a property mapred.child.java.opts.

HTH

-P

Prashant Sharma
  • 1,067
  • 8
  • 14
  • The query was about running (not project setup) YARN in Eclipse, were you able to do this? – Praveen Sripati Jan 23 '12 at 13:28
  • Well if you observe more closely. Yes!. I accept my approach is not quite right but does exactly what you want. checkout the right revision and let me know if you face troubles. – Prashant Sharma Mar 04 '12 at 05:42