1

I am using Apache spark and SqlContext in a notebook in Jupyter, in order to read and process data from a DataFrame. Besides, at the same time I want to have another Notebook that involves the creation and manipulation of another DataFrame.

Additionally, I followed the step list detailed in this post: Link Spark with iPython Notebook. Because I want to run Apache spark using notebooks.

However, when I try to run two notebooks at the same time, I get the following error:

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError('An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o27))

My spark configuration is very simple, it just run locally, for more details just see the following link: Spark's configuration

Therefore, my question is how I can run two Notebooks concurrently in Apache Spark and in Jupyter?

Thanks in advance.

Hugo Reyes
  • 1,555
  • 2
  • 12
  • 21
  • If you don't depend on the [features provided by `HiveContext`](http://stackoverflow.com/q/33666545/1560062) you can use plain `SQLContext` instead (`from pyspark import SQLContext; sqlContext = SQLContext(sc)`. – zero323 Feb 10 '16 at 15:50
  • What about if I try to create two `RDD`s in two different notebooks? Will this also work even if I read them from a CSV for example? – Hugo Reyes Feb 10 '16 at 17:50

0 Answers0