1

The spark documentation shows how a spark package can be added:

sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")

I believe this can only be used when initialising the session.

How can we add spark packages for SparkR using a notebook on DSX?

Chris Snow
  • 23,813
  • 35
  • 144
  • 309

1 Answers1

2

Please use pixiedust package manager to install the avro package.

pixiedust.installPackage("com.databricks:spark-avro_2.11:3.0.0")

http://datascience.ibm.com/docs/content/analyze-data/Package-Manager.html

Install it from python 1.6 kernel since pixiedust is importable in python.(Remember this is install at your spark instance level). Once you install it , restart the kernel and then switch to R kernel and then read the avro like this:-

df1 <- read.df("episodes.avro", source = "com.databricks.spark.avro", header = "true")

head(df1)

Complete Notebook:-

https://github.com/charles2588/bluemixsparknotebooks/raw/master/R/sparkRPackageTest.ipynb

Thanks, Charles.

charles gomes
  • 2,145
  • 10
  • 15