13

I am using Pyspark to run some commands in Jupyter Notebook but it is throwing error. I tried solutions provided in this link (Pyspark: Exception: Java gateway process exited before sending the driver its port number) and I tried doing the solution provided here (such as Changing the path to C:Java, Uninstalling Java SDK 10 and reinstalling Java 8, still it is throwing me the same error.

I tried uninstalling and reinstalling pyspark, and I tried running from anaconda prompt as well still I am getting the same error. I am using Python 3.7 and pyspark version is 2.4.0.

If I use this code, I get this error."Exception: Java gateway process exited before sending its port number".

from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext() 
sqlContext = SQLContext(sc)
from pyspark.mllib.linalg import Vector, Vectors
from nltk.stem.wordnet import WordNetLemmatizer
from pyspark.ml.feature import RegexTokenizer, StopWordsRemover, Word2Vec

But If I remove sparkcontext from this code runs fine, but I would need spark context for my solution. Below code without spark context does not throw any error.

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.mllib.linalg import Vector, Vectors
from nltk.stem.wordnet import WordNetLemmatizer
from pyspark.ml.feature import RegexTokenizer, StopWordsRemover, Word2Vec

I would appreciate if I could get any help figuring this out. I am using Windows 10 64 bit operating system.

Here is full error code picture.

enter image description here

Avi
  • 1,795
  • 3
  • 16
  • 29

6 Answers6

9

Type this in you bash terminal, and it will be fixed:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

All this does is export pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS.

xilpex
  • 3,097
  • 2
  • 14
  • 45
5

Try This

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Worked for me using linux.It should work for windows too

this link will help you for coz you are an windows user https://superuser.com/questions/947220/how-to-install-packages-apt-get-install-in-windows

1

Actually, this Error Occurs due to we are not setting JAVA_HOME in our environment variable. I was also getting the same error but without setting JAVA_HOME and SPARK_HOME you can set them in your Python code directly. But for that you should download JDK1.8, I am able to do it now. below is the solution with code -

import os
os.environ["JAVA_HOME"] = "C:/Program Files/Java/jdk1.8.0_45"
os.environ["SPARK_HOME"] = "C:/Users/Rahul/Downloads/spark-3.1.3-bin-hadoop2.7"
import findspark
findspark.init()
from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local[*]").getOrCreate()
df=spark.read.options(delimiter=",", header=True).csv("C:/Users/Downloads/sample_movie.csv")
df.show()
Rahul Pandey
  • 605
  • 1
  • 5
  • 17
0

How did you install spark?? Clearly, you are having trouble starting a java process, which is what that error means.

You may want to install Spark again using the instructions to the letter, wherever you found them. However, you could also use conda, (anaconda or miniconda), in which case installing pyspark will also get a current java for you

conda install pyspark
mdurant
  • 27,272
  • 5
  • 45
  • 74
  • btw: it is not surprising that the version without a context worked, you didn't actually try to launch anything yet. – mdurant Mar 31 '19 at 14:18
  • I used pip install pyspark on my anaconda cmd prompt. – Avi Mar 31 '19 at 22:28
  • For Windows 10, I had to reset java_home to jdk1.8 from a recent upgrade of jdk 16 to resolve this issue..also ensure winutils is for hadoop 3.2.1 and pyspark 3.1.2 – Binu Jul 05 '21 at 11:02
0

I have faced same issue then I installed jdk 8 not into program files but in a new separate folder called Java and issue got resolved.

Data_guy
  • 1
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 11 '22 at 21:10
0

Was able to solve it. open "edit the system variable" > "Environment Variables" > in "System Variables" (lower half part) > double click "PATH" > click "NEW" and add "C:\WINDOWS\System32" (without quotes).