I am receiving java.lang.OutOfMemoryError: Java heap space
error inside a function (run as a task in Spark) when trying to build a large String. My question, is which Spark parameter should I tune to avoid this error. I can increase the executors' memory of course. But what is the most memory efficient way of optimizing this. I am using Yarn, so would it be better to just increase spark.yarn.executor.memoryOverhead
to an amount big enough to avoid this error? Note that my strings are that big either. Maybe 1 or 2 GBs only while I give 6 GB for executors.
Asked
Active
Viewed 889 times
0

pythonic
- 20,589
- 43
- 136
- 219
-
check if this might help http://stackoverflow.com/a/22742982/2299040 – Sahil Manchanda Oct 20 '16 at 09:24
-
But how many of those strings are there ? Or some other values which exist in the same scope. – sarveshseri Oct 20 '16 at 09:24
-
Also... there is one very important thing that you need to know... Strings are immutable in Java. So of your do something like `String s = "a"; s = s + "b"; s = s + "c"`. Every concatenation create a new string while the old one is left of to be garbage collected later. And hence if you create a large String using concatenation then you will actually have many strings. Use `StringBuilder` to avoid those multiple copies. – sarveshseri Oct 20 '16 at 09:30
-
I know. Of course I am not making strings like that. – pythonic Oct 20 '16 at 11:06