Issue while running mapReduce job in Eclipse from Windows machine on a Hadoop cluster configured in Linux machine

Question

I am new in Hadoop, and here i have a hadoop cluster configured in 3 Linux machine with HBase. I created new tables and scan the data using java program from a remote Windows machine using Eclipse IDE. Now i can't execute a mapReduce job remotely, saying some issue. But the thing is that i can run the same job directly in Hadoop cluster machine, and its worked fine.

Hadoop version : hadoop-2.5.1 Hbase version : hbase-0.98.3-hadoop2

Can somebody tell me how shall I actually run the job remotely.

In Eclipse, the configuration setting are follows:

static Configuration conf = HBaseConfiguration.create();

static {
    conf.set("hbase.zookeeper.property.clientPort", "2181");
    conf.set("hbase.zookeeper.quorum", "192.168.10.152");

    conf.set("hbase.nameserver.address", "192.168.10.152");
    conf.set("hadoop.job.ugi", "root");
    conf.set("fs.defaultFS", "hdfs://192.168.10.152:9000");
    conf.set ("mapreduce.framework.name", "yarn");  
    conf.set("yarn.resourcemanager.address", "192.168.10.152:8032");
    conf.set("mapred.job.tracker", "192.168.10.152:54311");

}

In hadoop cluster, the configuration files are given below: hdfs-site.xml

 <property>
     <name>dfs.replication</name>
     <value>2</value>
 </property>
 <property>
     <name>dfs.name.dir</name>
     <value>/root/demo/meta/name</value>
 </property>
 <property>
     <name>dfs.data.dir</name>
     <value>/root/demo/meta/hadoop_data</value>
 </property>
 <property>
     <name>fs.checkpoint.dir</name>
     <value>/root/demo/meta/secondary_name</value>
 </property>

<property>
     <name>dfs.support.broken.append</name>
     <value>false</value>
     <description>Does HDFS allow appends to files?
     This is currently set to false because there are bugs in the
     "append code" and is not supported in any prodction cluster.   
     </description>       
</property>

core-site.xml

<property>
    <name>hadoop.tmp.dir</name>
    <value>/root/demo/meta/hadoop_tmp</value>
    <description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hmaster:9000</value>
</property>
<property>
        <name>io.file.buffer.size</name>
        <value>65536</value>
</property>
<property>
    <name>ipc.server.tcpnodelay</name>
    <value>true</value>
</property>

mapred-site.xml

  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>

 <property>
      <name>mapred.job.tracker</name>
      <value>hmaster:54311</value>
 </property>

 <property>
      <name>mapred.system.dir</name>
      <value>file:/root/demo/meta/mapred/system</value>
      <final>true</final>
 </property>
 <property>
     <name>mapred.local.dir</name>
     <value>file:/root/demo/meta/mapred/local</value>
     <final>true</final>
 </property>

yarn-site.xml

 <property>
    <name>yarn.resourcemanager.address</name>
    <value>192.168.10.152:8032</value>
 </property>
 <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>192.168.10.152:8030</value>
 </property>
 <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>192.168.10.152:8031</value>
 </property>
 <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>192.168.10.152:8033</value>
 </property>
 <property>
      <name>yarn.resourcemanager.webapp.address</name>
      <value>192.168.10.152:8088</value>
 </property>
 <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
 </property>
 <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>

Now I am waiting for your valid reply

By Jijoice jijoicena@gmail.com

Can you add error stack trace ?? without error stack trace, it would be very hard to identify the problem in your code.. — Mr.Chowdary, Apr 09 '15 at 06:34
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Pleas e check your configuration for mapreduce.framework.name and the correspond serve r addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) — Jijoice Augustine, Apr 09 '15 at 10:39
In Yarn, no use of configuring `mapred.job.tracker`, in `Configuration` object you used `hmaster:54311` but in `mapred-site.xml`, used ip address, both are linked or pointing to different machines ?? — Mr.Chowdary, Apr 10 '15 at 02:03
Hi, thanks for your reply, both the hmaster and the ip are pointing to the same machine, i will show you the /etc/hosts here 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.10.152 hmaster 192.168.10.148 slave1 192.168.10.151 slave2 — Jijoice Augustine, Apr 10 '15 at 08:52
Hi, thanks for your reply, both the hmaster and the ip are pointing to the same machine, i will show you the /etc/hosts here 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.10.152 hmaster 192.168.10.148 slave1 192.168.10.151 slave2 and I removed the mapred.job.tracker from configuration , but showing the same issue. — Jijoice Augustine, Apr 10 '15 at 09:07
from Windows machine, I am trying to run the job using the command java -jar Summary.jar historical.Summary, but showing the error like Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Pleas e check your configuration for mapreduce.framework.name and the correspond serve r addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:12..... — Jijoice Augustine, Apr 10 '15 at 09:10
Hope [this](http://stackoverflow.com/questions/22191568/not-able-to-run-hadoop-job-remotely) answer will help you — Mr.Chowdary, Apr 10 '15 at 09:23
But when i am running from the Linux machine, (where I configured hadoop) using java -jar command, showing the same issue as shown in the windows. But when i am running with hadoop/bin/haddop cmd or yarn command its working successfully. Now I need to know how can I run this job from my Windows eclipse without any issues, there is any other more configuration i need to put in my program? Waiting for your valid reply. — Jijoice Augustine, Apr 10 '15 at 09:42
Hi, I go through your link, it saying about installing haddop in windows machine and run job in that, but here my question more tahn that, for running job from windows machine on hadoop cluster which is configured in Linux machine, I need install hadoop in Windows machine? Waiting for your valid reply.. — Jijoice Augustine, Apr 10 '15 at 10:09

Jijoice Augustine · Answer 1 · 2015-04-17T12:11:18.400

I got an alternate method to run any jar file or linux commands using below program, the link and the code is given below

link : http://www.journaldev.com/246/java-program-to-run-shell-commands-on-ssh-enabled-system

program :

package historicalInfo;
import java.io.InputStream;

import com.jcraft.jsch.Channel;
import com.jcraft.jsch.ChannelExec;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;


public class SSHCommandExecutor {

    /**
     * @param args
     */
    public static void main(String[] args) {
//giving the linux machine configuration..
        String host="hbasemaster";
        String user="root";
        String password="Global12$";
// this mapReduce jar file should be copied from windows machine to linux //machine in the //particular path.. shown below..
        String command1 = "java -jar /root/demo/hbase-prgm/map6.jar historicalInfo.MySummaryJob";
        try{

            java.util.Properties config = new java.util.Properties(); 
            config.put("StrictHostKeyChecking", "no");
            JSch jsch = new JSch();
            Session session=jsch.getSession(user, host, 22);
            session.setPassword(password);
            session.setConfig(config);
            session.connect();
            System.out.println("Connected");

            Channel channel=session.openChannel("exec");

            ((ChannelExec)channel).setCommand(command1);
            channel.setInputStream(null);
            ((ChannelExec)channel).setErrStream(System.err);

            InputStream in=channel.getInputStream();
            channel.connect();
            byte[] tmp=new byte[1024];
            while(true){
              while(in.available()>0){
                int i=in.read(tmp, 0, 1024);
                if(i<0)break;
                System.out.print(new String(tmp, 0, i));
              }
              if(channel.isClosed()){
                System.out.println("exit-status: "+channel.getExitStatus());
                break;
              }
              try{Thread.sleep(1000);}catch(Exception ee){}
            }
            channel.disconnect();
            session.disconnect();
            System.out.println("DONE");
        }catch(Exception e){
            e.printStackTrace();
        }

    }

}

and in mapReduce program, only need to specify the configuration given below

   Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.zookeeper.property.clientPort", "2181");
    conf.set("hbase.zookeeper.quorum", "hmaster");

After running this program in Eclipse, my mapReduce job worked successfully in hadoop machine.

Enjoy your coding :)

Issue while running mapReduce job in Eclipse from Windows machine on a Hadoop cluster configured in Linux machine

1 Answers1