I would like to use Spyder with pyspark (spark-2.1.1) but I cannot fix a rather frustrating Java error. I launch spyder from command line in Windows 10 after activating a conda environment (Python version is 3.5.3). This is my code:
import pyspark
sc = pyspark.SparkContext("local")
file = sc.textFile("C:/test.log")
words = file.flatMap(lambda line : line.split(" "))
words.count()
When I try to define sc
i get the following error:
File "D:\spark-2.1.1-bin-hadoop2.7\python\pyspark\java_gateway.py", line 95, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
For the sake of completeness:
if I run
pyspark
from the command line after activating the conda environment, it works and correctly performs the word count task.If I launch Spyder App Desktop from the Start Menu in Windows 10, everything works (but I think I cannot load the right python modules from my conda environment in this case).
The related environment variables seem to be ok:
echo %SPARK_HOME%
D:\spark-2.1.1-bin-hadoop2.7
echo %JAVA_HOME%
C:\Java\jdk1.8.0_121
echo %PYTHONPATH%
D:\spark-2.1.1-bin-hadoop2.7\python;D:\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip; D:\spark-2.1.1-bin-hadoop2.7\python\lib; C:\Users\user\Anaconda3
I have already tried with the solutions proposed here, but nothing worked for me. Any suggestion is greatly appreciated!