3

I'm running the following code in Rhadoop:

Sys.setenv(HADOOP_HOME="/home/ashkan/Downloads/hadoop-1.0.3/")
Sys.setenv(HADOOP_BIN="/home/ashkan/Downloads/hadoop-1.0.3/bin/")
Sys.setenv(HADOOP_CONF_DIR="/home/ashkan/Downloads/hadoop-1.0.3/conf")
Sys.setenv(HADOOP_CMD="/home/ashkan/Downloads/hadoop-1.0.3/bin/hadoop")
library (Rhipe)
library(rhdfs)
library(rmr2)

hdfs.init()
small.ints = to.dfs(1:10)
  mapreduce(
    input = small.ints, 
  map = function(k, v)
  {
    lapply(seq_along(v), function(r){
      x <- runif(v[[r]])
      keyval(r,c(max(x),min(x)))
    })})

How ever, I get the following error:

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

Does anyone know what the problem is? Thanks a lot.

Ashkan
  • 31
  • 1
  • 2
  • Can you run just the 'map' portion of your code on the command line and get meaningful output? ... That will tell you whether it's a code issue or an environment setup issue. – economy Mar 07 '15 at 00:28

1 Answers1

0

To fix the problem you'll have to set the HADOOP_STREAMING environment variable. The below code worked fine for me. Note that your code is not using Rhipe so no need to load.

R Code (I'm using hadoop 2.4.0)

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

library(rhdfs)
# Initialise
hdfs.init()
library(rmr2)

hdfs.init()
small.ints = to.dfs(1:10)
  mapreduce(
    input = small.ints, 
  map = function(k, v)
  {
    lapply(seq_along(v), function(r){
      x <- runif(v[[r]])
      keyval(r,c(max(x),min(x)))
    })})

I'm guessing that your hadoop streaming path will be as below:

Sys.setenv("HADOOP_STREAMING"="/home/ashkan/Downloads/hadoop-1.0.3/contrib/streaming/hadoop-streaming-1.0.3.jar")

Hope this helps.

Manohar Swamynathan
  • 2,065
  • 21
  • 23