1

I used the link below to learn how to run SparkR through RStudio:

http://blog.danielemaasit.com/2015/07/26/installing-and-starting-sparkr-locally-on-windows-8-1-and-rstudio/

I am having trouble with section 4.5.

if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
  Sys.setenv(SPARK_HOME = "C:/Apache/spark-2.0.0")
}
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "1g"))

library(SparkR)
sc<-sparkR.session(master = "local")
sqlContext <- sparkRSQL.init(sc)

DF <- createDataFrame(sqlContext, faithful)

Error comes up when I run the DF function: 

Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
    at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
    at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
    at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
    at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
    at org.a
In addition: Warning message:
'createDataFrame(sqlContext...)' is deprecated.
Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
See help("Deprecated") 

I can't really tell what the error is and any help would be greatly appreciated.

Thanks!

nak5120
  • 4,089
  • 4
  • 35
  • 94
  • 1
    could you please share the output of `jps` from your terminal? – Krishna Kalyan Aug 16 '16 at 21:11
  • 1
    What is your Spark version? From 1.6.0 on, Spark includes SparkR, and you should not download & install an older SparkR version (1.4.0), as you seem to have done (it will not work). Also, check if ``SPARK_HOME`` is set, and if your ``SPARK_HOME/R/lib`` directory exists – desertnaut Aug 17 '16 at 18:21
  • I just tried downloading 1.6.0 but having trouble installing that part. I realized I didn't do that originally so once I figure that out, this question may have more relevance. – nak5120 Aug 18 '16 at 14:02
  • You are having trouble installing what? Spark 1.6? If so, have a look at my answer here http://stackoverflow.com/questions/33887227/how-to-upgrade-spark-to-newer-version/33914992#33914992 – desertnaut Aug 18 '16 at 15:08
  • Just edited the question @desertnaut – nak5120 Aug 18 '16 at 15:57
  • Why u call ``library(SparkR)`` & ``spark.session`` twice? Plus, many of the commands u use are deprecated in Spark 2.0 - see official docs here http://spark.apache.org/docs/latest/sparkr.html#starting-up-from-rstudio – desertnaut Aug 18 '16 at 16:26
  • I was just testing things out, but yeah I'll only do `spark.session`. – nak5120 Aug 18 '16 at 16:31

1 Answers1

1

Try this

Sys.setenv(SPARK_HOME = "C://Apache/spark-2.0.0")

You need to use "//" above.

Penn Rah
  • 26
  • 1