I'm trying to import csv file using pyspark. I tried this and this.
Using the first method I could read the csv file. But number of variables are quite large. So manually mentioning the variable name is difficult.
Using second method (spark-csv), I could read csv file using command prompt. But when I tried to use the same method in Jupyter notebook, I'm getting error:
Py4JJavaError: An error occurred while calling o89.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
I tired this options as well. I fixed "conf" file. But don't know how to set "PACKAGES" and "PYSPARK_SUBMIT_ARGS" in windows environment.
Could anyone please help me how to read csv files in spark dataframe?
Thanks!