I'm trying to learn Spark together with Python on a Win10 virtual machine. For that, I'm trying to read data from a CSV file, with PySpark, but stops a the following:
C:\Users\israel\AppData\Local\Programs\Python\Python37\python.exe C:/Users/israel/Desktop/airbnb_python/src/main/python/spark_python/airbnb.py
hello world1
System cannot find the specified route
I have read How to link PyCharm with PySpark? , PySpark, Win10 - The system cannot find the path specified , The system cannot find the path specified error while running pyspark , PySpark - The system cannot find the path specified but haven't found luck implementing the solutions.
I'm using IntelliJ, python 3.7. This is the run configuration.
I'm using IntelliJ, python 3.7. The code is as follows
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import *
if __name__ == "__main__":
print("hello world1")
spark = SparkSession \
.builder \
.appName("spark_python") \
.master("local") \
.getOrCreate()
print("hello world2")
path = "C:\\Users\\israel\\Desktop\\data\\listings.csv"
df = spark.read\
.format("csv")\
.option("header", "true")\
.option("inferSchema", "true")\
.load(path)
df.show()
spark.stop()
It seems like the error is in the SparkSession, but I don't see how the announced error is related to that line. It is worth to mention that the execution never ends, I have to manually stop the execution to rerun it. Can anyone give me lights on what I'm doing wrong?. Please