Increase memory available to PySpark at runtime

Question

I'm trying to build a recommender using Spark and just ran out of memory:

Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space

I'd like to increase the memory available to Spark by modifying the spark.executor.memory property, in PySpark, at runtime.

Is that possible? If so, how?

update

inspired by the link in @zero323's comment, I tried to delete and recreate the context in PySpark:

del sc
from pyspark import SparkConf, SparkContext
conf = (SparkConf().setMaster("http://hadoop01.woolford.io:7077").setAppName("recommender").set("spark.executor.memory", "2g"))
sc = SparkContext(conf = conf)

returned:

ValueError: Cannot run multiple SparkContexts at once;

That's weird, since:

>>> sc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sc' is not defined

What do you mean by "at runtime"? By modifying existing `SparkContext`? — zero323, Jul 16 '15 at 21:21
Yes, exactly. I'd like to increase the amount of memory within the PySpark session. — Alex Woolford, Jul 16 '15 at 21:24
Within session you [stop existing context and create new one](http://stackoverflow.com/a/31402667/1560062) using specific settings but as far as I know you cannot modify an existing one. — zero323, Jul 16 '15 at 21:29
rather than `del sc` you need to stop the context: `sc.stop()` — ohailolcat, Mar 25 '19 at 17:34

score 52 · Accepted Answer · answered Sep 24 '16 at 00:12

I'm not sure why you chose the answer above when it requires restarting your shell and opening with a different command! Though that works and is useful, there is an in-line solution which is what was actually being requested. This is essentially what @zero323 referenced in the comments above, but the link leads to a post describing implementation in Scala. Below is a working implementation specifically for PySpark.

Note: The SparkContext you want to modify the settings for must not have been started or else you will need to close it, modify settings, and re-open.

from pyspark import SparkContext
SparkContext.setSystemProperty('spark.executor.memory', '2g')
sc = SparkContext("local", "App Name")

source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html

p.s. if you need to close the SparkContext just use:

SparkContext.stop(sc)

and to double check the current settings that have been set you can use:

sc._conf.getAll()

Awesome! I've been looking everywhere for this! – aaronsteers Jun 05 '19 at 16:37 — aaronsteers, Jun 05 '19 at 16:37
is there a way to set it to 'max'? – Snow Mar 09 '21 at 15:18 — Snow, Mar 09 '21 at 15:18

score 42 · Answer 2 · answered Jul 30 '15 at 05:14

42

You could set spark.executor.memory when you start your pyspark-shell

pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g

answered Jul 30 '15 at 05:14

Minh Ha Pham

2,566
2
28
43

While this does work, it doesn't address the use case directly because it requires changing how python/pyspark is launched up front. For those who need to solve the inline use case, look to abby's answer. – aaronsteers Jun 05 '19 at 16:05
This works better in my case bc the in-session change requires re-authentication – alisa Aug 02 '19 at 22:47

score 4 · Answer 3 · edited Jan 09 '19 at 18:54

4

Citing this, after 2.0.0 you don't have to use SparkContext, but SparkSession with conf method as below:

spark.conf.set("spark.executor.memory", "2g")

edited Jan 09 '19 at 18:54

roschach

8,390
14
74
124

answered Sep 18 '18 at 22:21

Gomes

3,330
25
17

3

That's Scala, not Python. – Kyle Barron Nov 02 '18 at 21:38

score 3 · Answer 4 · answered Sep 27 '16 at 19:02

3

As far as i know it wouldn't be possible to change the spark.executor.memory at run time. The containers, on the datanodes, will be created even before the spark-context initializes.

answered Sep 27 '16 at 19:02

avrsanjay

805
7
12

Increase memory available to PySpark at runtime

4 Answers4