3

I am running a rmr2 example from here, this is the code i tried :

Sys.setenv(HADOOP_HOME="/home/istvan/hadoop")
Sys.setenv(HADOOP_CMD="/home/istvan/hadoop/bin/hadoop")

library(rmr2)
library(rhdfs)

ints = to.dfs(1:100)
calc = mapreduce(input = ints,
                   map = function(k, v) cbind(v, 2*v))

I am using hadoop-streaming-1.1.1.jar, after calling mapreduce function job starts and it fails with exception :

2013-12-16 16:26:14,844 WARN mapreduce.Counters: Group 

org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2013-12-16 16:26:15,600 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312160142_0009/jars/job.jar <- /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312160142_0009/attempt_201312160142_0009_m_000000_0/work/job.jar
2013-12-16 16:26:15,604 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312160142_0009/jars/.job.jar.crc <- /app/cloudera/mapred/local/taskTracker/nkumar/jobcache/job_201312160142_0009/attempt_201312160142_0009_m_000000_0/work/.job.jar.crc
2013-12-16 16:26:15,693 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2013-12-16 16:26:15,695 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2013-12-16 16:26:16,312 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2013-12-16 16:26:16,319 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6bdc64a5

2013-12-16 16:26:16,757 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as counter name instead
2013-12-16 16:26:16,763 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2013-12-16 16:26:16,772 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2013-12-16 16:26:16,779 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 450
2013-12-16 16:26:17,432 INFO org.apache.hadoop.mapred.MapTask: data buffer = 358612992/448266240
2013-12-16 16:26:17,432 INFO org.apache.hadoop.mapred.MapTask: record buffer = 1179648/1474560
2013-12-16 16:26:17,477 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/Rscript, ./rmr-streaming-map5b17a2a9ff]
2013-12-16 16:26:17,561 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2013-12-16 16:26:17,570 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2013-12-16 16:26:17,571 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2013-12-16 16:26:17,587 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-12-16 16:26:17,591 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
2013-12-16 16:26:17,605 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

its creating a sequence file in /tmp directory on hdfs. Any suggestions to fix it thanks.

Edit :

Found this answer Hadoop Streaming Job failed error in python so i tried executing r script with these 2 lines at the top :

#!/usr/bin/Rscript
#!/usr/bin/env Rscript

no luck.

Community
  • 1
  • 1
Naveen
  • 773
  • 3
  • 17
  • 40
  • is your file created ? try from.dfs(ints).Do you see anything ? Have you set HADOOP_STREAMING env var ?Is your other path correct (/home/istvan/hadoop) ??? – saurabh shashank Dec 16 '13 at 06:44
  • from.dfs(ints) return : `13/12/16 06:39:03 INFO compress.CodecPool: Got brand-new decompressor [.deflate] $key NULL $val [1] 1 2 3 4 5 6 7 8 9 10` `HADOOP_STREAMING` is also set and `HADOOP_HOME` is pointing to correct directory. – Naveen Dec 16 '13 at 11:41
  • Can you check the permission on hadoop streaming jar file ? try chmod +X $HADOOP_STREAMING Also the log files of task-Tracker will be real help . If you can paste that . – saurabh shashank Dec 16 '13 at 12:04
  • tried changing hadoop-streaming jar to executable same error, on task tracker i can only see xml files that have variables for that job. – Naveen Dec 16 '13 at 12:39
  • Try : >> R library(rmr2) Loading required package: Rcpp Loading required package: RJSONIO Loading required package: bitops Loading required package: digest Loading required package: functional Loading required package: stringr Loading required package: plyr Loading required package: reshape2 Do you See all the above packages loading ???? – saurabh shashank Dec 16 '13 at 13:52
  • yes all required packages are loaded, `> library(rmr2) Loading required package: Rcpp Loading required package: RJSONIO Loading required package: bitops Loading required package: digest Loading required package: functional Loading required package: stringr Loading required package: plyr Loading required package: reshape2` I am also able to run map-reduce job from java program reading that sequence file. – Naveen Dec 16 '13 at 14:04

0 Answers0