I am trying to run a simple MR program using rmr2 in a single node Hadoop cluster. Here is the environment for the setup
Ubuntu 12.04 (32 bit)
R (Ubuntu comes with 2.14.1, so updated to 3.0.2)
Installed the latest rmr2 and rhdfs from here and the corresponding dependencies
Hadoop 1.2.1
Now I am trying to run a simple MR program as
Sys.setenv(HADOOP_HOME="/home/training/Installations/hadoop-1.2.1")
Sys.setenv(HADOOP_CMD="/home/training/Installations/hadoop-1.2.1/bin/hadoop")
library(rmr2)
library(rhdfs)
ints = to.dfs(1:100)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
from.dfs(calc)
The mapreduce job fails with the below error message in hadoop-1.2.1/logs/userlogs/job_201310091055_0001/attempt_201310091055_0001_m_000000_0/stderr
Error in library(functional) : there is no package called ‘functional’
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
But, the sessionInfo()
shows that functional package has been loaded
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i686-pc-linux-gnu (32-bit)
>locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C
>attached base packages:
[1] stats graphics grDevices utils datasets methods base
>other attached packages:
[1] rhdfs_1.0.6 rJava_0.9-4 rmr2_2.3.0 reshape2_1.2.2 plyr_1.8
[6] stringr_0.6.2 **functional_0.4** digest_0.6.3 bitops_1.0-6 RJSONIO_1.0-3
[11] Rcpp_0.10.5
Update : I am able to run a R MR job reading and writing from STDIO without using the rmr2 and the rhdfs libraries as mentioned here. So, for now my guess is that the problem is isolated to rmr2 and the rhdfs packages.
How to get around this problem?