0

I ran Giraph 1.1.0 on Hadoop 2.6.0. The mapredsite.xml looks like this

<configuration>

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs. Can be one of
    local, classic or yarn.</description>
</property>

<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>
</configuration>

The giraph-site.xml looks like this

<configuration>
<property>
        <name>giraph.SplitMasterWorker</name>
        <value>true</value>
</property>
<property>
        <name>giraph.logLevel</name>
        <value>error</value>
</property>
</configuration>

I do not want to run the job in the local mode. I have also set environment variable MAPRED_HOME to be HADOOP_HOME. This is the command to run the program.

hadoop jar myjar.jar hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation /user/$USER/inputbc/inputgraph.txt /user/$USER/outputBC 1.0 1

When I run this code that computes betweenness centrality of vertices in a graph, I get the following exception

Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!
        at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168)
        at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)
        at hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation.runMain(BetweennessComputation.java:214)
        at hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation.main(BetweennessComputation.java:218)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

What should I do to ensure that the job does not run in local mode?

3 Answers3

2

I have met the problem just a few days ago.Fortunately i solved it by doing this.

Modify the configuration file mapred-site.xml,make sure the value of property 'mapreduce.framework.name' to be 'yarn' and add the property 'mapreduce.jobtracker.address' which value is 'yarn' if there is not.

The mapred-site.xml looks like this:

<configuration>
   <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
   </property>
   <property>
     <name>mapreduce.jobtracker.address</name>
     <value>yarn</value>
   </property>
</configuration>

Restart hadoop after modifying the mapred-site.xml.Then run your program and set the value which is after '-w' to be more than 1 and the value of 'giraph.SplitMasterWorker' to be 'true'.It will probably work.

As for the cause of the problem,I just quote somebody's saying: These properties are designed for single-node executions and will have to be changed when executing things in a cluster of nodes. In such a situation, the jobtracker has to point to one of the machines that will be executing a NodeManager daemon (a Hadoop slave). As for the framework, it should be changed to 'yarn'.

田凯飞
  • 21
  • 2
0

We can see that in the stack-trace where the configuration check in LocalJobRunner fails this is a bit misleading because it makes us assume that we run in local model.You already found the responsible configuration option: giraph.SplitMasterWorker but in your case you set it to true. However, on the command-line with the last parameter 1 you specify to use only a single worker. Hence the framework decides that you MUST be running in local mode. As a solution you have two options:

  • Set giraph.SplitMasterWorker to false although you are running on a cluster.
  • Increase the number of workers by changing the last parameter to the command-line call.

    hadoop jar myjar.jar hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation /user/$USER/inputbc/inputgraph.txt /user/$USER/outputBC 1.0 4

Please refer also to my other answer at SO (Apache Giraph master / worker mode) for details on the problem concerning local mode.

Community
  • 1
  • 1
Matthias Steinbauer
  • 1,786
  • 11
  • 24
-1

If you are after to split the master from the node you can use:

-ca giraph.SplitMasterWorker=true

also to specify the amount of workers you can use:

-w #

where "#" is the number of workers you want to use.

Emily
  • 1
  • 1
  • 3
  • In my giraph-site.xml giraph.SplitMasterWorker is already set to true. Regarding the number of workers, the fourth parameter in the command actually specifies the number of workers which is 1 in this case. – Sai Ganesh Muthuraman Sep 04 '16 at 05:53
  • This answer is unrelated to the question. OP already has the exact configuration you mention only in configuration files. The specification of number of workers is wrong. See the code at: https://github.com/mbalassi/msc-thesis/blob/master/src/main/java/hu/elte/inf/mbalassi/msc/giraph/betweenness/BetweennessComputation.java – Matthias Steinbauer Sep 05 '16 at 06:33