I'm trying to run a script in the pyspark environment but so far I haven't been able to.
How can I run a script like python script.py
but in pyspark?
I'm trying to run a script in the pyspark environment but so far I haven't been able to.
How can I run a script like python script.py
but in pyspark?
You can do: ./bin/spark-submit mypythonfile.py
Running python applications through pyspark
is not supported as of Spark 2.0.
pyspark 2.0 and later execute script file in environment variable PYTHONSTARTUP
, so you can run:
PYTHONSTARTUP=code.py pyspark
Compared to spark-submit
answer this is useful for running initialization code before using the interactive pyspark shell.
You can execute "script.py" as follows
pyspark < script.py
or
# if you want to run pyspark in yarn cluster
pyspark --master yarn < script.py
Existing answers are right (that is use spark-submit
), but some of us might want to just get started with a sparkSession object like in pyspark.
So in the pySpark script to be run first add:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master('yarn') \
.appName('pythonSpark') \
.enableHiveSupport()
.getOrCreate()
Then use spark.conf.set('conf_name', 'conf_value')
to set any configuration like executor cores, memory, etc.
Spark environment provides a command to execute the application file, be it in Scala or Java(need a Jar format), Python and R programming file. The command is,
$ spark-submit --master <url> <SCRIPTNAME>.py
.
I'm running spark in windows 64bit architecture system with JDK 1.8 version.
P.S find a screenshot of my terminal window. Code snippet