Hadoop Streaming using python

Question

I am trying to execute the map reduce code as below:

hadoop jar /usr/lib/Hadoop/Hadoop-streaming-0.20.2-cdh3u2.jar –file mapper.py –mapper mapper.py –file reducer.py – reducer reducer.py –input /user/training/samplypy.txt –ouput  /user/training/pythonMR/output

getting below exception -

Exception in thread "main" java.lang.ClassNotFoundException: –file
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

I am using Hadoop 1.0.3. I've tried with multiple versions of hadoop-streaming jar like:

hadoop-streaming-0.20.2-cdh3u2.jar 
hadoop-streaming-1.2.0.jar 
hadoop-streaming.jar

Refer this, http://stackoverflow.com/questions/16701979/packaging-a-jython-program-in-an-executable-jar. — srikanth, Aug 11 '15 at 11:18

score 1 · Answer 1 · answered Nov 17 '15 at 16:29

One thing I can tell is that you did not use full path for '-file' statement:

–file /mapper/location/mapper.py (use full path with the file name here)

–mapper mapper.py (correct, mapper file name only)

–file /reducer/location/reducer.py (use full path with the file name here)

– reducer reducer.py (correct, reducer file name only)
make sure your -input and -output are pointing to HDFS not local path

Here is the code I used:

hadoop jar /opt/cloudera/parcels/hadoop-streaming.jar \
-D mapred.reduce.tasks=15 -D stream.map.input.field.separator=',' -D stream.map.output.field.separator=',' \
-D mapred.textoutputformat.separator=',' \
-input /user/temp/in/ \
-output /user/temp/out \
-file  /app/qa/python/mapper.py \
-mapper mapper.py \
-file  /app/qa/python/reducer.py \
-reducer reducer.py

Hadoop Streaming using python

1 Answers1