How to start a Spark Shell using pyspark in Windows?

Question

I am a beginner in Spark and trying to follow instructions from here on how to initialize Spark shell from Python using cmd: http://spark.apache.org/docs/latest/quick-start.html

But when I run in cmd the following:

C:\Users\Alex\Desktop\spark-1.4.1-bin-hadoop2.4\>c:\Python27\python bin\pyspark

then I receive the following error message:

File "bin\pyspark", line 21 
export SPARK_HOME="$(cd ="$(cd "`dirname "$0"`"/..; pwd)" 
SyntaxError: invalid syntax

What am I doing wrong here?

P.S. When in cmd I try just C:\Users\Alex\Desktop\spark-1.4.1-bin-hadoop2.4>bin\pyspark

then I receive ""python" is not recognized as internal or external command, operable program or batch file".

score 4 · Answer 1 · answered Jul 28 '15 at 05:18

4

You need to have Python available in the system path, you can add it with setx:

setx path "%path%;C:\Python27"

answered Jul 28 '15 at 05:18

maxymoo

35,286
11
92
119

score 2 · Answer 2 · answered Feb 12 '16 at 02:45

I'm a fairly new Spark user (as of today, really). I am using spark 1.6.0 on Windows 10 and 7 machines. The following worked for me:

import os

import sys

spark_home = os.environ.get('SPARK_HOME', None)

if not spark_home:

raise ValueError('SPARK_HOME environment variable is not set')

sys.path.insert(0, os.path.join(spark_home, 'python'))

sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip'))

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

Using the code above, I was able to launch Spark in an IPython notebook and my Enthought Canopy Python IDE. Before, this, I was only able to launch pyspark through a cmd prompt. The code above will only work if you have your Environment Variables set correctly for Python and Spark (pyspark).

score 1 · Answer 3 · answered Mar 02 '16 at 05:40

I run these set of path settings whenever I start pyspark in ipython:

import os
import sys
# Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"') for R
### MANNN restart spart using ipython notebook --profile=pyspark --packages com.databricks:spark-csv_2.10:1.0.3  
os.environ['SPARK_HOME']="G:/Spark/spark-1.5.1-bin-hadoop2.6"

sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/bin") 
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python") 
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/pyspark/") 
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/pyspark/mllib")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib") 
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip")
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip") 
sys.path.append("G:/Spark/spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip")

from pyspark import SparkContext 
from pyspark import SparkConf
from pyspark import SQLContext 

##sc.stop() # IF you wish to stop the context
sc = SparkContext("local", "Simple App")

score 0 · Answer 4 · answered Jul 28 '15 at 18:09

0

With the reference and help of the user "maxymoo" I was able to find a way to set a PERMANENT path is Windows 7 as well. The instructions are here:

http://geekswithblogs.net/renso/archive/2009/10/21/how-to-set-the-windows-path-in-windows-7.aspx

answered Jul 28 '15 at 18:09

Alex

41
1
1
2

score 0 · Answer 5 · edited Apr 04 '17 at 20:24

0

Simply set path in System -> Environment Variables -> Path

R Path in my system C:\Program Files\R\R-3.2.3\bin
Python Path in my system c:\python27
Spark Path in my system c:\spark-2

The path must be separated by ";" and there must be no space between paths

edited Apr 04 '17 at 20:24

Alexei - check Codidact

22,016
16
145
164

answered Apr 04 '17 at 19:58

Zeeshan Anwar

1
3

How to start a Spark Shell using pyspark in Windows?

5 Answers5

Linked