-2

I had written a map reduce program using hadoop streaming in python which used to work on the udacity training virtual machine .To run the hadoop streaming command they had an alias = hs mapper reducer input output .... and it worked perfectly Now i switched over to the cloudera training VM and i tried running the exact same map reduce using the actual streaming command and it fails. Is there anything I've done wrong ?

The streaming command i used is

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.7.0.jar  -input test  -output eout  -mapper "matest1.py" -file matest1.py  -reducer "retest2.py" -file retest2.py

Is there any solution?

Edit this the output error :

16/06/11 13:25:50 INFO mapreduce.Job: Task Id : attempt_1465622696533_0007_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/06/11 13:26:11 INFO mapreduce.Job:  map 100% reduce 100%
16/06/11 13:26:12 INFO mapreduce.Job: Job job_1465622696533_0007 failed with state FAILED due to: Task failed task_1465622696533_0007_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/06/11 13:26:13 INFO mapreduce.Job: Counters: 9
    Job Counters 
        Failed map tasks=8
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=177373
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=177373
        Total vcore-seconds taken by all map tasks=177373
        Total megabyte-seconds taken by all map tasks=181629952
16/06/11 13:26:13 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

Edit stderr:

Jun 12, 2016 12:09:29 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Jun 12, 2016 12:09:29 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Jun 12, 2016 12:09:29 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Jun 12, 2016 12:09:29 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Jun 12, 2016 12:09:29 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Jun 12, 2016 12:09:30 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Jun 12, 2016 12:09:32 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Jun 12, 2016 12:09:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Jun 12, 2016 12:09:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"

Edit sys logs :

2016-06-12 12:12:31,459 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1465711165129_0004_m_000001_3: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-06-12 12:12:31,460 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1465711165129_0004_m_000001_3 TaskAttempt Transitioned from RUNNING to FAIL_FINISHING_CONTAINER
2016-06-12 12:12:31,460 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1465711165129_0004_m_000000_3: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-06-12 12:12:31,461 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1465711165129_0004_m_000000_3 TaskAttempt Transitioned from RUNNING to FAIL_FINISHING_CONTAINER
2016-06-12 12:12:31,749 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1465711165129_0004_m_000001 Task Transitioned from RUNNING to FAILED
2016-06-12 12:12:31,749 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1465711165129_0004_m_000000 Task Transitioned from RUNNING to FAILED
2016-06-12 12:12:31,750 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2016-06-12 12:12:31,780 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-06-12 12:12:31,792 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1465711165129_0004Job Transitioned from RUNNING to FAIL_WAIT
2016-06-12 12:12:31,793 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1465711165129_0004_r_000000 Task Transitioned from SCHEDULED to KILL_WAIT
2016-06-12 12:12:31,793 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1465711165129_0004_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to KILLED
2016-06-12 12:12:31,793 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1465711165129_0004_r_000000 Task Transitioned from KILL_WAIT to KILLED
2016-06-12 12:12:31,794 INFO [Thread-52] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE
2016-06-12 12:12:31,796 ERROR [Thread-52] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not deallocate container for task attemptId attempt_1465711165129_0004_r_000000_0
2016-06-12 12:12:32,015 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1465711165129_0004Job Transitioned from FAIL_WAIT to FAIL_ABORT
2016-06-12 12:12:32,019 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_ABORT
2016-06-12 12:12:32,071 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:12 ContRel:4 HostLocal:2 RackLocal:0
2016-06-12 12:12:32,074 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:4096, vCores:5>
2016-06-12 12:12:32,074 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold reached. Scheduling reduces.
2016-06-12 12:12:32,074 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps assigned. Ramping up all remaining reduces:1
2016-06-12 12:12:32,074 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:1 AssignedMaps:2 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:12 ContRel:4 HostLocal:2 RackLocal:0
2016-06-12 12:12:32,194 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1465711165129_0004Job Transitioned from FAIL_ABORT to FAILED
Community
  • 1
  • 1
Rahul Aedula
  • 117
  • 9
  • The Udacity environment does not match the cloudera VM. Please check that all paths to the files you are using are correct and add what the actual error is to your question with an [edit], please. – OneCricketeer Jun 11 '16 at 07:42
  • I have made the edits with the error , is there anything i should do ? @cricket_007 – Rahul Aedula Jun 11 '16 at 08:00
  • You should go into the the YARN UI or Application Master, or whatever the view is that tells you the reason why the Map task failed. – OneCricketeer Jun 11 '16 at 08:06
  • Rather, the "Job History Server" – OneCricketeer Jun 11 '16 at 08:08
  • This is a stupid question , but where can i find that Job history server , I'm really new to hadoop , so please excuse me @cricket_007 – Rahul Aedula Jun 11 '16 at 08:30
  • From Cloudera Manager in the MapReduce view, there should be something. Here's an old documentation guide -- https://www.cloudera.com/documentation/archive/manager/4-x/4-5-2/Cloudera-Manager-Enterprise-Edition-User-Guide/cmeeug_topic_8_3.html – OneCricketeer Jun 11 '16 at 08:35
  • Or try `http://quickstart.cloudera:10020` – OneCricketeer Jun 11 '16 at 08:36
  • I have a small question , does cdh5 treat standard input any differently from cdh 4.1 ? cause the same program works on the other udacity Vm just fine The only different between the two programs is that in the one which doesn't work has a numpy.loadtxt(sys.stdin) @cricket_007 – Rahul Aedula Jun 11 '16 at 17:16
  • Not that I'm aware of. Standard input is an operating system thing, not application level. If you are using numpy, then I think you have to bundle the numpy library into the Hadoop jar command. I'm not completely sure about that, though – OneCricketeer Jun 11 '16 at 17:21
  • If i run a job locally i.e cat test|./mapper.py|sort|./reducer.py does that mean it has to work on hadoop streaming ? or are there any cases where it doesn't work @cricket_007 – Rahul Aedula Jun 12 '16 at 05:26
  • Also I've checked the Job browser on Hue for the error logs , the logs are the same as i posted also in the stderr and stdout is completely empty as in its blank @cricket_007 – Rahul Aedula Jun 12 '16 at 06:20
  • Ok i just managed to generate the stderr and sys log files , posting them now – Rahul Aedula Jun 12 '16 at 06:54
  • It can't tell anything from either of those logs. I want to say there's either a problem with the permissions of the files themselves or the directories you are trying to read/write from/to. You could start reading over this link http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ – OneCricketeer Jun 12 '16 at 15:16

1 Answers1

0

It most probably seems like an error with the reading of input data files. All of the 8 map tasks are failing to happen. Perhaps there maybe a problem with file size, or in the mapper. Have a read, this may help:

https://github.com/RevolutionAnalytics/rmr2/issues/112

or this answer : python - PipeMapRed.waitOutputThreads(): subprocess failed with code 1

Community
  • 1
  • 1
  • I've seen both of these answers before , it doesn't give any particular insight to my problem . How does reading of input data file change from cdh4 to cdh5 ? It doesn't make sense – Rahul Aedula Jun 17 '16 at 13:46
  • In udacity VM there's a command called hs , which is basically an alias for map reduce ,is there anything more i should add to the command ? – Rahul Aedula Jun 17 '16 at 13:48
  • Exit code one means, it is most probably an error within your python code! Have a read. http://stackoverflow.com/a/24913246/6210905 – Himanshu Mangla Jun 20 '16 at 15:31
  • Here's the thing the python code is fine , individually it works just fine. I've done python mapper.py and python reducer.py does only one thing and i.e waits for standard input even when i simulate it as cat test.txt|./mapper.py|sort|./reducer.py it gives the output – Rahul Aedula Jun 23 '16 at 13:49