0

I am new to Hadoop and Python and trying to make a mapper for common log format:

[training@localhost code]$ ls 
access_log  mapper2.py  mylocalfile.txt
reducer2.py  reducer4.py  testfile
cat    mapper.py   practice.py   reducer3.py  reducer.py
[training@localhost code]$ vim reducer4.py
[training@localhost code]$ hs mapper2.py reducer2.py access_log output2
packageJobJar: [mapper2.py, reducer2.py, /tmp/hadoop-training/hadoop-unjar2368120810978008335/] [] /tmp/streamjob7636411608115265060.jar tmpDir=null
17/01/05 05:50:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/01/05 05:50:32 WARN snappy.LoadSnappy: Snappy native library is available
17/01/05 05:50:32 INFO snappy.LoadSnappy: Snappy native library loaded
17/01/05 05:50:32 INFO mapred.JobClient: Cleaning up the staging area hdfs://0.0.0.0:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201701041500_0009
17/01/05 05:50:32 ERROR security.UserGroupInformation: PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://0.0.0.0:8020/user/training/access_log
17/01/05 05:50:32 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: hdfs://0.0.0.0:8020/user/training/access_log
Streaming Command Failed!

I don't understand why it can't find the file, what am I doing wrong?

Fia256
  • 1
  • 1
  • Have you checked whether that path exists on the HDFS or not? – Rajat Mishra Jan 08 '17 at 03:29
  • Possible duplicate of [Hadoop Streaming Command Failure with Python Error](http://stackoverflow.com/questions/15302262/hadoop-streaming-command-failure-with-python-error) – Arnab Nandy Jan 08 '17 at 04:20

0 Answers0