I am trying to run the shortest paths example from the giraph incubator (https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example). However instead of executing the example from the giraph-*-dependencies.jar, I have created my own job jar. When I created a single Job file as presented in the example, I was getting
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.Test$SimpleShortestPathsVertexInputFormat
Then I have moved the inner classes (SimpleShortestPathsVertexInputFormat and SimpleShortestPathsVertexOutputFormat) to separates files and renamed them just in case (SimpleShortestPathsVertexInputFormat_v2, SimpleShortestPathsVertexOutputFormat_v2); the classes are not static anymore. This have solved the issues of class not found for the SimpleShortestPathsVertexInputFormat_v2, however I am still getting the same error for the SimpleShortestPathsVertexOutputFormat_v2. Below is my stack trace.
INFO mapred.JobClient: Running job: job_201205221101_0003
INFO mapred.JobClient: map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201205221101_0003_m_000005_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:898)
at org.apache.giraph.graph.BspUtils.getVertexOutputFormatClass(BspUtils.java:134)
at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
at org.apache.hadoop.mapred.Task.initialize(Task.java:490)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:890)
... 9 more
I have inspected my job jar and all classes are there. Furthermore I am using hadoop 0.20.203 in a pseudo distributed mode. The way I launch my job is presented below.
hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar /path/to/input /path/to/output 0 3
Also I have defined HADOOP_CLASSPATH for the giraph-*-dependencies.jar. I can run the PageRankBenchmark example without a problem (directly from the giraph-*-dependencies.jar), and the shortes path example works as well (also directly from the giraph-*-dependencies.jar). Other hadoop jobs work without a problem (somewhere I have read to test if my "cluster" works correctly). Does anyone came across similar problem? Any help will be appreciated.
Solution (sorry to post it like this but I can't answer my own question for a couple of more hours)
To solve this issue I had to add my Job jar to the -libjars (no changes to HADOOP_CLASSPATH where made). The command to launch job now looks like this.
hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar,/path/to/job.jar /path/to/input /path/to/output 0 3
List of jars has to be comma separated. Though this has solved my problem. I am still curious why I have to pass my job jar as a "classpath" parameter? Can someone explain me what is the rational behind this? As I found it strange (to say the least) to invoke my job jar and then pass it again as a "classpath" jar. I am really curious about the explanation.