java.lang.OutOfMemoryError: Java heap space error whithin a task in Spark

Asked Oct 20 '16 at 09:07

Active Oct 20 '16 at 09:07

Viewed 889 times

I am receiving java.lang.OutOfMemoryError: Java heap space error inside a function (run as a task in Spark) when trying to build a large String. My question, is which Spark parameter should I tune to avoid this error. I can increase the executors' memory of course. But what is the most memory efficient way of optimizing this. I am using Yarn, so would it be better to just increase spark.yarn.executor.memoryOverhead to an amount big enough to avoid this error? Note that my strings are that big either. Maybe 1 or 2 GBs only while I give 6 GB for executors.

asked Oct 20 '16 at 09:07

pythonic

20,589
43
136
219

check if this might help http://stackoverflow.com/a/22742982/2299040 – Sahil Manchanda Oct 20 '16 at 09:24
But how many of those strings are there ? Or some other values which exist in the same scope. – sarveshseri Oct 20 '16 at 09:24
Also... there is one very important thing that you need to know... Strings are immutable in Java. So of your do something like `String s = "a"; s = s + "b"; s = s + "c"`. Every concatenation create a new string while the old one is left of to be garbage collected later. And hence if you create a large String using concatenation then you will actually have many strings. Use `StringBuilder` to avoid those multiple copies. – sarveshseri Oct 20 '16 at 09:30
I know. Of course I am not making strings like that. – pythonic Oct 20 '16 at 11:06

java.lang.OutOfMemoryError: Java heap space error whithin a task in Spark

0 Answers0