I'm new to Hadoop and I'm running a Map Reduce process to count revenue of different stores. The mapper and reducer programs work perfectly. And i double-checked the files and the direcotries.
When i run the MapReduce command which is:
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce1/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar \
-mapper mapper.py \
-reducer reducer.py \
-input /home/anwarvic \
-output /joboutput
it gives the following output:
17/04/30 05:48:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/30 05:48:14 INFO Configuration.deprecation: mapred.job.tracker is` deprecated. Instead, use mapreduce.jobtracker.address
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob7598928362555913238.jar tmpDir=null
17/04/30 05:48:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/30 05:48:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/30 05:48:21 INFO mapred.FileInputFormat: Total input paths to process : 5
17/04/30 05:48:21 INFO net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
17/04/30 05:48:24 INFO mapreduce.JobSubmitter: number of splits:6
17/04/30 05:48:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1493523215757_0002
17/04/30 05:48:27 INFO impl.YarnClientImpl: Submitted application application_1493523215757_0002
17/04/30 05:48:28 INFO mapreduce.Job: The url to track the job: http://anwar-computer:8088/proxy/application_1493523215757_0002/
17/04/30 05:48:28 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
17/04/30 05:48:28 INFO streaming.StreamJob: Running job: job_1493523215757_0002
17/04/30 05:48:28 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/30 05:48:29 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:08 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:10 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:41 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:42 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:49:43 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:49:45 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:50:07 INFO streaming.StreamJob: map 17% reduce 0%
17/04/30 05:50:08 INFO streaming.StreamJob: map 0% reduce 0%
17/04/30 05:50:37 INFO streaming.StreamJob: map 100% reduce 100%
17/04/30 05:50:41 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/30 05:50:41 ERROR streaming.StreamJob: Job not successful. Error: Task failed task_1493523215757_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
17/04/30 05:50:41 INFO streaming.StreamJob: killJob...
17/04/30 05:50:41 INFO impl.YarnClientImpl: Killed application application_1493523215757_0002
Streaming Command Failed!
The output basically says that the Job is not successful although the Map and Reduce processes are done by 100%
As states in this answer and this, I added the shebang headers to both mapper.py and reduce.py files:
#!/usr/bin/env python
By the way, this answer didn't work for me!
I've been in this problem for around 20 hours.. so any help would be so appreciated