Run SparkR | or R package on my Cloudera 5.9 Spark

Question

I have 3 node cluster having Cloudera 5.9 running on CentOS 6.7. I need to connect my R packages (running on my Laptop) to the Spark runing in cluster mode on Hadoop.

However If I try to connect the local R through Sparklyr Connect to Hadoop Spark it is giving Error. As it is searching the Spark home on the laptop itself.

I googled and found we can install SparkR and use R with Spark. However I have few questions regarding the same.

I have downloaded the tar file from https://amplab-extras.github.io/SparkR-pkg/ But my question is I directly copy it to my Linux server and install?
Do I have to Stop/delete my existing Spark which is NOT Stand Alone and using Yarn i.e. it is running in Cluster mode? or SparkR can just run on top of it, If I install it on the server?
Or do I have to run Spark on Stand Alone mode (get Spark gateways running and Start master/slave using script) and install the package from linux command line on top of it?
If it get installed will I be able to access it through CM UI?

Please help, I am new in this and really need guidance.

Thanks, Shilpa

How did you try to connect to the Spark cluster? This answer might be useful: http://stackoverflow.com/a/38107699/2026277 — Jaime Caffarel, Jan 03 '17 at 20:27

score 0 · Answer 1 · answered Jan 11 '17 at 09:00

0

follow these links.. you may find answers to above questions

http://blog.danielemaasit.com/2015/07/26/installing-and-starting-sparkr-locally-on-windows-8-1-and-rstudio/

http://sbartek.github.io/sparkRInstall/installSparkReasyWay.html

answered Jan 11 '17 at 09:00

Yasodhara

111
2
13

score 0 · Answer 2 · answered Jan 28 '17 at 01:57

0

The best way to install R and then install SparkR on top of it is here : http://blog.clairvoyantsoft.com/2016/11/installing-sparkr-on-a-hadoop-cluster/

I was able to install them following this link. It is really useful and latest.

thanks, Shilpa

answered Jan 28 '17 at 01:57

TextShilpa

21
5

score 0 · Answer 3 · edited Sep 22 '18 at 14:35

0

I installed R studio on CentOS and got a e-GUI from below link http://devopspy.com/linux/install-r-rstudio-centos-7/

Later I tried to install sparklyr but faced lot of issue. Finally resolved it by installing:

sudo yum install libcurl-devel
sudo yum install openssl-devel
sudo yum install libgit2-devel

Later you may normally install sparklyr package.

edited Sep 22 '18 at 14:35

tk421

5,775
6
23
34

answered Sep 22 '18 at 13:48

user2640679

54
3

Run SparkR | or R package on my Cloudera 5.9 Spark

3 Answers3