0

To run Pyspark I have installed it through pip install pyspark. Now to initialize the session after going through many blogs I am running below command

import pyspark
spark = pyspark.sql.SparkSession.builder.appName('test').getOrCreate()

Above code giving me the error

Exception: Java gateway process exited before sending the driver its port number

This will be my first program for spark. I want your advice on whether "pip install pyspark" is enough to run spark on my windows laptop or I need to do something else.

I have Java 8 version installed on my laptop and I am using conda with python 3.6.

Rahul Sharma
  • 89
  • 1
  • 2
  • 9
  • No, you need to have pre-built spark present on your system and have the environment variable: SPARK_HOME set to that directory. You might also need to set HADOOP_HOME if you are doing read operations from spark. Follow this to do it: https://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha – Lokesh Yadav Dec 28 '17 at 17:56
  • I'm not convinced you actually have spark installed. You can download it from spark.apache.org. As Lokesh said, use the "pre-build with Hadoop" version. – J'e Dec 28 '17 at 18:41
  • Can you include the entire exception? – Jacek Laskowski Dec 29 '17 at 07:30
  • @16num yes you are correct I haven't installed the spark explicitly apart from pyspark. I just want to check whether pyspark commands run perfectly or not in the current environment. Even for this, I need to install spark on my laptop? – Rahul Sharma Dec 29 '17 at 07:49
  • @Lokesh thanks, I will try what you have suggested first through the link and get back in case of query. – Rahul Sharma Dec 29 '17 at 07:50

0 Answers0