0

I'm trying to import csv file using pyspark. I tried this and this.

Using the first method I could read the csv file. But number of variables are quite large. So manually mentioning the variable name is difficult.

Using second method (spark-csv), I could read csv file using command prompt. But when I tried to use the same method in Jupyter notebook, I'm getting error:

Py4JJavaError: An error occurred while calling o89.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org

I tired this options as well. I fixed "conf" file. But don't know how to set "PACKAGES" and "PYSPARK_SUBMIT_ARGS" in windows environment.

Could anyone please help me how to read csv files in spark dataframe?

Thanks!

Community
  • 1
  • 1
Beta
  • 1,638
  • 5
  • 33
  • 67
  • [This](http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/) should help you get the spark-csv package working in Jupyter. – Alfredo Gimenez May 17 '16 at 21:14
  • @spiffman: Thanks for your comment and useful link! I could actually read jason files properly. But when I'm trying to read csv I'm getting the error. Also, I'm working on standalone cluster in windows system. So many of the suggestions found in the internet seems not applicable for me. – Beta May 18 '16 at 07:41

0 Answers0