Questions tagged [rhadoop]

RHadoop is combination of R and Hadoop to manage and analyze data with Hadoop

RHadoop is a collection of three R packages that allow users to manage and analyze data with Hadoop. The packages have been implemented and tested in Cloudera's distribution of Hadoop (CDH3) & (CDH4). and R 2.15.0. THe packages have also been tested with Revolution R 4.3, 5.0, and 6.0. For rmr see Compatibility.

Source: Github: Revolution Analytics (RHadoop)

112 questions
8
votes
2 answers

Container is running beyond virtual memory limits

When I do rhadoop example, below errors are occurred. is running beyond virtual memory limits. Current usage: 121.2 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container. Container killed on request. Exit code is…
yes89929
  • 319
  • 1
  • 4
  • 11
7
votes
4 answers

How to install RHadoop packages (Rmr, Rhdfs, Rhbase)?

Actually I am trying my level best to integrate with R, but I got this error. packages ‘rmr’, ‘rJava‘, ‘RJSONIO‘, ‘rhdfs’, ‘rhbase’, ‘plyrmr’ are not available (for R version 3.1.3) Steps to integrate Hadoop with R: Installed R, and Hadoop in…
Venu A Positive
  • 2,992
  • 2
  • 28
  • 31
7
votes
2 answers

Streaming Command Failed! in RHADOOP

I have installed RHADOOP in Hortonwork VM. when I am running mapreduce code to verify it is throwing an error saying I am using user as :rstudio (not root.but has access to sudoer) Streaming Command Failed! Can anybody help me understanding the…
Amaresh
  • 3,231
  • 7
  • 37
  • 60
7
votes
2 answers

R+Hadoop: How to read CSV file from HDFS and execute mapreduce?

In the following example: small.ints = to.dfs(1:1000) mapreduce( input = small.ints, map = function(k, v) cbind(v, v^2)) The data input for mapreduce function is an object named small.ints which refered to blocks in HDFS. Now I have a…
Hao Huang
  • 221
  • 4
  • 16
4
votes
2 answers

R-Hadoop integration - how to connect R to remote hdfs

I have a case where I will be running R code on a data that will be downloaded from Hadoop. Then, the output of the R code will be uploaded back to Hadoop as well. Currently, I am doing it manually and I would like to avoid this manual…
KTY
  • 709
  • 1
  • 9
  • 17
4
votes
1 answer

RHadoop: REDUCE capability required is more than the supported max container capability in the cluster

Has anybody similar issue with in R (build 1060) on top of sandbox Hadoop (Cloudera5.1/Hortonworks2.1)? It seems to be a problem of new R/Hadoop, because on CDH5.0 it…
yottalab
  • 76
  • 1
  • 5
3
votes
1 answer

Rhadoop basic task on a single machine

I'm running the following code in…
Ashkan
  • 31
  • 1
  • 2
3
votes
2 answers

install.packages("methods") failed for R 3.0.1

I try to install R package "methods" over R 3.0.1: > install.packages("methods") > Warning message: package ‘methods’ is not available (for R version 3.0.1) Is there any way to install 'methods' over R 3.0.1 or should I switch to R 3.0.0? Thank you
sunny
  • 1,887
  • 3
  • 29
  • 40
3
votes
1 answer

Problems running simple rhadoop jobs - broken pipe error

I have a hadoop cluster setup with the rmr2 and rhdfs packages installed. I've been able to run some sample MR jobs through the CLI and through rscripts. For example, this works: #!/usr/bin/env Rscript require('rmr2') small.ints =…
Ilion
  • 6,772
  • 3
  • 24
  • 47
2
votes
0 answers

Unable to establish connection between Impala and Rstudio using rimpala.connect()

I am unable to establish connection between Impala and RStudio. I am using Cloudera quickstart vm for Cloudera Manager and RStudio Please see code below and advise if anything could be…
Enno Victor
  • 41
  • 1
  • 2
  • 7
2
votes
0 answers

Unable to connect R rhdfs APIs with Hadoop Cluster, which is running on different IP address

Add Hadoop Home Sys.setenv("HADOOP_HOME"="ssh://root@192.168.10.70/home/easy/hadoop") Set HADOOP CMD…
Prabhat Jain
  • 321
  • 1
  • 4
  • 9
2
votes
2 answers

"fatal error: TProcessor.h: No such file or directory" when trying to install Rhbase package

everyone, I'm trying to install Rhbase package, but first I was missing thrift package, what I solved, but now it shows me another error. I added TProcessor.h into ../lib/cpp/src/thrift/processor/ but it didn't help and it shows me the same error:…
Andrea
  • 21
  • 4
2
votes
2 answers

How to run sparkR in 64-bit mode

I've install Spark - 1.4.1 (have R 3.1.3 version). Currently testing SparkR to run statistical models. I'm able to run some sample code such as, Sys.setenv(SAPRK_HOME =…
Vijay_Shinde
  • 1,332
  • 2
  • 17
  • 38
2
votes
1 answer

Can I use readLines in mapreduce job in Rhadoop?

I'm trying to read text or gz file from HDFS and run a simple mapreduce job (actually only the map job) but got error which seems like the readLines part doesn't work. I'm seeking answers of whether I can use readLines function in mapreduce. ps.…
chelsea
  • 21
  • 1
2
votes
0 answers

Error in FUN(X[[2L]], ...) : Sorry, parameter type `NA' is ambiguous or not supported

I am trying the below R script to built logistic regression model using RHadoop (rmr2, rhdfs packages) on an HDFS data file located at "hdfs://:/somnath/merged_train/part-m-00000" and then testing the model using a test HDFS data file at…
somnathchakrabarti
  • 3,026
  • 10
  • 69
  • 92
1
2 3 4 5 6 7 8