0

I'm working with apache-spark and ipython and trying to load csv file in notebook. but I am getting error: Py4JJavaError: An error occurred while calling o22.load.
While searching I found out that by loading spark-csv this will be resolved. I want to know how to load spark-csv in notebook in windows, and also if someone can tell me another way to resolve this error. I have uploaded a screen shot of error.

py4jjavaerror errorpic 2

Thomas K
  • 39,200
  • 7
  • 84
  • 86
Inam
  • 71
  • 1
  • 1
  • 5
  • Possible duplicate of [How to add any new library like spark-csv in Apache Spark prebuilt version](http://stackoverflow.com/questions/30757439/how-to-add-any-new-library-like-spark-csv-in-apache-spark-prebuilt-version) – shivsn Jun 15 '16 at 12:37
  • Its not duplicate. in this particulr question you mention he asked about adding spark-csv in apache prebuilt version and i asked about adding in jupyter notebook. and i also asked about any other method to solve py4jjava error. – Inam Jun 15 '16 at 17:57
  • just add the jars or package your error will be solved its a duplicate. – shivsn Jun 15 '16 at 18:35
  • ok so tell me how to add jars or package in jupyter notebook in windows? all the answer of that question you mention is not working for me... – Inam Jun 15 '16 at 20:05

1 Answers1

0

I had the same issue. This is the way that I fixed. I used anaconda 3.5 jupyter notebook and Windows 10:

import os
import sys

SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.4.0 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
    raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'C:/spark/python/lib/py4j-0.9-src.zip'))

exec(open(os.path.join(spark_home, 'C:/spark/python/pyspark/shell.py')).read()) # python 3


from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

df = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('C:/spark_data/train.csv')

df.show()
S.G.
  • 1
  • 1