0

I'm using PySpark on my Linux computer. My Spark version is 2.4.4.

I have a small script that initializes the basic entry points, including SparkContext, SQLContext, and SparkSession. This is the code.

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

import pyspark.sql.functions as sqlfunc

I don't want to type this every time I open PySpark. Thus, I would like to

a) run this script in my terminal

b) continue working in my PySpark interactive shell

How can I do this?

I read this following thread to learn how to run a PySpark script within my terminal.

https://stackoverflow.com/a/48861241/12170242

It runs the script, but it doesn't open the PySpark shell, so it's not exactly what I want.

Iterator516
  • 187
  • 1
  • 11

2 Answers2

1

SparkSession is an unified functionality, no need to initialize again.

Coming to other part of import the functionality by default, you can use this

>>> execfile("<some name>.py")

You can write all the required spark code here & execute it. For example:

vi scrp.py

df1 = sc.parallelize([[1,2,3], [2,3,4]]).toDF(("a", "b", "c"))
df1.show()

from pyspark.sql.functions import *

In pyspark-shell

>>> execfile("scrp.py")
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  2|  3|
|  2|  3|  4|
+---+---+---+

>>> df1.show()
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  2|  3|
|  2|  3|  4|
+---+---+---+

>>> df1.withColumn("news", when(col("a") > 1, "t")).show()
+---+---+---+----+
|  a|  b|  c|news|
+---+---+---+----+
|  1|  2|  3|null|
|  2|  3|  4|   t|
+---+---+---+----+

Hope it helps ..

Sarath Chandra Vema
  • 792
  • 1
  • 6
  • 13
0

When you open a pyspark shell, sparkSession or sparkContext is already available as spark or sc respectively.

sparkSession is available over Apache Spark v2.0, and earlier versions would have sc as sparkContext:

EDIT:

You can write the code for importing everything and creating a sparkContext, sqlContext etc, and start the python shell in interactive mode.

python -i yourfile.py
Community
  • 1
  • 1
pissall
  • 7,109
  • 2
  • 25
  • 45