4

I'm trying to learn Spark together with Python on a Win10 virtual machine. For that, I'm trying to read data from a CSV file, with PySpark, but stops a the following:

enter image description here

C:\Users\israel\AppData\Local\Programs\Python\Python37\python.exe C:/Users/israel/Desktop/airbnb_python/src/main/python/spark_python/airbnb.py

hello world1

System cannot find the specified route

I have read How to link PyCharm with PySpark? , PySpark, Win10 - The system cannot find the path specified , The system cannot find the path specified error while running pyspark , PySpark - The system cannot find the path specified but haven't found luck implementing the solutions.

I'm using IntelliJ, python 3.7. This is the run configuration.

enter image description here

enter image description here

enter image description here

I'm using IntelliJ, python 3.7. The code is as follows

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import *


if __name__ == "__main__":

    print("hello world1")

    spark = SparkSession \
        .builder \
        .appName("spark_python") \
        .master("local") \
        .getOrCreate()

    print("hello world2")

    path = "C:\\Users\\israel\\Desktop\\data\\listings.csv"

    df = spark.read\
        .format("csv")\
        .option("header", "true")\
        .option("inferSchema", "true")\
        .load(path)

    df.show()

    spark.stop()

It seems like the error is in the SparkSession, but I don't see how the announced error is related to that line. It is worth to mention that the execution never ends, I have to manually stop the execution to rerun it. Can anyone give me lights on what I'm doing wrong?. Please

Community
  • 1
  • 1
Israel Rodriguez
  • 425
  • 1
  • 6
  • 24

2 Answers2

1

I'm sure this is not the best solution, but one approach would be to launch your python interpreter directly from pyspark binary.

This can be located in: $SPARK_HOME\bin\pyspark

Additionally, if you modify your environment variables when any terminals are active the variables are not refreshed till the next launch. This applies to Pycharm too. If you haven't tried, a restart of pycharm may also help.

0

If the error message is written with sys.stderr

The answers I provide here are not for real questions,

but I noticed what you said: but I don't see how the announced error is related to that line...

So I want to provide you with debugging to find the location of the code that generated this message.

According to the image of your airhnb(the first one), the error message El sistema no puede encontrar la ruta especificada. It looks like this was written by sys.stderr

So my method is to redirect sys.stderr, like the following:

import sys


def the_process():
    ...
    sys.stderr.write('error message')


class RedirectStdErr:
    def write(self, msg: str):
        if msg == 'error message':
            set_debug_point_at_here = 1
        original.write(msg)
        original.flush()


original = sys.stderr
sys.stderr = RedirectStdErr()

the_process()

enter image description here

As long as you set the breakpoint on the set_debug_point_at_here = 1, then you can know where the real place to call this code is.

Carson
  • 6,105
  • 2
  • 37
  • 45