1

It is possible to connect sparklyr with a remote hadoop cluster or it is only possible to use it local? And if it is possible, how? :)

In my opinion the connection from R to hadoop via spark is very important!

user43348044
  • 305
  • 3
  • 15

2 Answers2

0

Do you mean Hadoop or Spark cluster? If Spark, you can try to connect through Livy, details here: https://github.com/rstudio/sparklyr#connecting-through-livy

Note: Connecting to Spark clusters through Livy is under experimental development in sparklyr

michalrudko
  • 1,432
  • 2
  • 16
  • 30
  • I mean connect to Hadoop via Spark. Is this possible? – user43348044 May 22 '17 at 05:21
  • I am not sure what type of Spark installation you have - if it's on yarn then Spark can read data from HDFS. So the answer is: yes. However, I am afraid that you need to provide more information to get a good hint on what you need. – michalrudko May 22 '17 at 22:38
  • Ok, thanks. My data are stored in a Cloudera Hadoop Cluster. To access the data via hive with an JDBC-Connection works fine with are. Is this with sparkly also possible? If yes, how? :) – user43348044 May 23 '17 at 06:33
  • Please check here: https://stackoverflow.com/questions/38102921/can-sparklyr-be-used-with-spark-deployed-on-yarn-managed-hadoop-cluster – michalrudko May 23 '17 at 13:57
0

You could use livy which is a Rest API service for the spark cluster.

once you have set up your HDinsight cluster on Azure check for livy service using curl

#curl test
curl -k --user "admin:mypassword1!" -v -X GET 


#r-studio code
sc <- spark_connect(master = "https://<yourclustername>.azurehdinsight.net/livy/",
                     method = "livy", config = livy_config(
                       username = "admin",
                       password = rstudioapi::askForPassword("Livy password:")))

Some useful URL https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface

Akshay Kadidal
  • 515
  • 1
  • 7
  • 15