I'm using hadoop 1.0.1 on a single node and I'm trying to stream a tab delimited file using python 2.7. I can get Michael Noll's word count scripts to run using hadoop/python, but can't get this extremely simple mapper and reducer to work that just duplicates the file. Here's the mapper:
import sys
for line in sys.stdin:
line = line.strip()
print '%s' % line
Here's the reducer:
import sys
for line in sys.stdin:
line = line.strip()
print line
Here's part of the input file:
1 857774.000000
2 859164.000000
3 859350.000000
...
The mapper and reducer work fine within linux:
cat input.txt | python mapper.py | sort | python reducer.py > a.out
but after I chmod the mapper and reducer, move the input file to hdfs and check that it's there and run:
bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file mapperSimple.py -mapper mapperSimple.py -file reducerSimple.py -reducer reducerSimple.py -input inputDir/* -output outputDir
I get the following error:
12/06/03 10:19:11 INFO streaming.StreamJob: map 0% reduce 0%
12/06/03 10:20:15 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201206030550_0003_m_000001
12/06/03 10:20:15 INFO streaming.StreamJob: killJob...
Streaming Job Failed!
Any ideas? Thanks.