4

I am trying to run a simple MR program using rmr2 in a single node Hadoop cluster. Here is the environment for the setup

Ubuntu 12.04 (32 bit)
R (Ubuntu comes with 2.14.1, so updated to 3.0.2)
Installed the latest rmr2 and rhdfs from here and the corresponding dependencies
Hadoop 1.2.1

Now I am trying to run a simple MR program as

Sys.setenv(HADOOP_HOME="/home/training/Installations/hadoop-1.2.1")
Sys.setenv(HADOOP_CMD="/home/training/Installations/hadoop-1.2.1/bin/hadoop")

library(rmr2)  
library(rhdfs)

ints = to.dfs(1:100)  
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
from.dfs(calc)

The mapreduce job fails with the below error message in hadoop-1.2.1/logs/userlogs/job_201310091055_0001/attempt_201310091055_0001_m_000000_0/stderr

Error in library(functional) : there is no package called ‘functional’  
Execution halted  
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1  
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)  
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)  

But, the sessionInfo() shows that functional package has been loaded

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i686-pc-linux-gnu (32-bit)

>locale:
 [1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN       
 [4] LC_COLLATE=en_IN     LC_MONETARY=en_IN    LC_MESSAGES=en_IN   
 [7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C 

>attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

>other attached packages:
 [1] rhdfs_1.0.6    rJava_0.9-4    rmr2_2.3.0     reshape2_1.2.2 plyr_1.8      
 [6] stringr_0.6.2  **functional_0.4** digest_0.6.3   bitops_1.0-6   RJSONIO_1.0-3 
[11] Rcpp_0.10.5

Update : I am able to run a R MR job reading and writing from STDIO without using the rmr2 and the rhdfs libraries as mentioned here. So, for now my guess is that the problem is isolated to rmr2 and the rhdfs packages.

How to get around this problem?

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • Did you upgrade R and install all the packages as `sudo ...` (as root)? Perhaps you are having a user vs system context issue. The environment seen by the R script running in hadoop may not be the same as a that from a user's interactive R command line, which appears to be where you are running `sessionInfo()`. Can you run `sessionInfo()` from within the script? – Stuart R. Jefferys Oct 11 '13 at 20:26
  • Thanks Stuart, I have installed R using the `sudo apt-get ...` command and upgraded the R packages into a private library. I am able to run MR programs (Java and Streaming) and simple R programs successfully using `vm4learning` user. The users are same for both of them, so there seems to be no context issue. The problem is running R MR programs using rmr package as `vm4learning` user. – Praveen Sripati Oct 12 '13 at 07:17

2 Answers2

5

Install the dependencies for rmr2/rhdfs in a system directory instead of a custom directory (~/R/x86_64-pc-linux-gnu-library/3.0). This can be done running R as sudo and then installing the dependencies. Thanks to Antonio for the help in the RHadoop forums.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • I know this is a very old thread. I am also stuck at the similar problem. I have only one node in cluster, where R packages are getting installed in ~/R/x86_64-pc-linux-gnu-library/3.2. Can you please guide me how to change package directory? – Mohitt Jul 22 '15 at 14:04
  • I got it worked finally. Thanks for putting up the fix. – Mohitt Jul 23 '15 at 04:41
1

The most common solution of these kind of problem is re-installation since in sesssionInfo() you are getting

**functional_0.4** 

while when i did sessionInfo() i got

functional_0.4

i guess there is some missing dependencies you might be missing so use from your R console

install.packages("functional",dependencies="TRUE") 

to fix any problem due to any other packages .

P.S: Choose cloud-0 mirror from the available ones.

If still that does not help i recommend you use r-base-dev as your R version though i don't have a reason to justify this using http://cran.r-project.org/bin/linux/ubuntu/README

sudo apt-get install r-base-dev

Thanks

igauravsehrawat
  • 3,696
  • 3
  • 33
  • 46
  • Did try the mentioned solutions already and none of them worked. r-base has a dependency on r-base-dev, so it is already installed. Already reinstalled it a couple of times. – Praveen Sripati Oct 12 '13 at 07:04