37

I'm trying to build a recommender using Spark and just ran out of memory:

Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space

I'd like to increase the memory available to Spark by modifying the spark.executor.memory property, in PySpark, at runtime.

Is that possible? If so, how?

update

inspired by the link in @zero323's comment, I tried to delete and recreate the context in PySpark:

del sc
from pyspark import SparkConf, SparkContext
conf = (SparkConf().setMaster("http://hadoop01.woolford.io:7077").setAppName("recommender").set("spark.executor.memory", "2g"))
sc = SparkContext(conf = conf)

returned:

ValueError: Cannot run multiple SparkContexts at once;

That's weird, since:

>>> sc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sc' is not defined
Alex Woolford
  • 4,433
  • 11
  • 47
  • 80

4 Answers4

52

I'm not sure why you chose the answer above when it requires restarting your shell and opening with a different command! Though that works and is useful, there is an in-line solution which is what was actually being requested. This is essentially what @zero323 referenced in the comments above, but the link leads to a post describing implementation in Scala. Below is a working implementation specifically for PySpark.

Note: The SparkContext you want to modify the settings for must not have been started or else you will need to close it, modify settings, and re-open.

from pyspark import SparkContext
SparkContext.setSystemProperty('spark.executor.memory', '2g')
sc = SparkContext("local", "App Name")

source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html

p.s. if you need to close the SparkContext just use:

SparkContext.stop(sc)

and to double check the current settings that have been set you can use:

sc._conf.getAll()
abby sobh
  • 1,574
  • 19
  • 15
42

You could set spark.executor.memory when you start your pyspark-shell

pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g
Minh Ha Pham
  • 2,566
  • 2
  • 28
  • 43
  • While this does work, it doesn't address the use case directly because it requires changing how python/pyspark is launched up front. For those who need to solve the inline use case, look to abby's answer. – aaronsteers Jun 05 '19 at 16:05
  • This works better in my case bc the in-session change requires re-authentication – alisa Aug 02 '19 at 22:47
4

Citing this, after 2.0.0 you don't have to use SparkContext, but SparkSession with conf method as below:

spark.conf.set("spark.executor.memory", "2g")
roschach
  • 8,390
  • 14
  • 74
  • 124
Gomes
  • 3,330
  • 25
  • 17
3

As far as i know it wouldn't be possible to change the spark.executor.memory at run time. The containers, on the datanodes, will be created even before the spark-context initializes.

avrsanjay
  • 805
  • 7
  • 12