I am trying to read a textfile using pyspark that lives locally, and it is telling me the file does not exist:
sc = SparkContext()
sc._conf.setMaster("local[*]")
sc.setLogLevel("DEBUG")
sqlContext = SQLContext(sc)
inpath='file:///path/to/file'
input_data = sqlContext.read.text(inpath)
and I get this:
Py4JJavaError: An error occurred while calling o52.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, <hostname>): java.io.FileNotFoundException: File file:/path/to/file does not exist
I understand that you need to make sure that you change the configurations for spark when you are reading files locally, when you are running this on a cluster. But, this is sitting on the master node, and the file does not need to be distributed across all nodes.
I checked out this question How to load local file in sc.textFile, instead of HDFS , and I tried the suggestion to set sc._conf.setMaster("local[*]")
but that did not help - after restarting the spark context and rerunning it still does not work.
Is there any other setting I can change so that this can work?