Spark job throwing "java.lang.OutOfMemoryError: GC overhead limit exceeded"

Question

I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead limit exceeded".

The job is trying to process a filesize 4.5G.

I've tried following spark configuration:

--num-executors 6  --executor-memory 6G --executor-cores 6 --driver-memory 3G

I tried increasing more cores and executor which sometime works, but takes over 20 minutes to process the file.

Could I do something to improve the performance? or stop the Java Heap issue?

Identify which operation is causing the OOME and try to do it differently. Post on SO for help. — Jean Logeart, Jun 15 '15 at 19:11
GC overhead limit exceeded means that JVM is not able to reclaim any considerable amount of memory after GC pause. This indicates some kind of memory leak - You may be at luck by tuning heap size parameter `spark.executor.memory`. I do not think this is really getting set by your --executor-memory parameter. Take look at this SO : http://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory post as well — ring bearer, Jun 15 '15 at 19:19
@Mark - tried that but the problem does show up now and gain. — diplomaticguru, Jun 16 '15 at 10:36
@VijayInnamuri, yes I'm caching. Well, initially I cached it into Memory but later persisted into Memory_And_Disk. I've noticed that stages were failing due to lost executor failures, so they were being recomputed, so degrading the performance. — diplomaticguru, Jun 16 '15 at 10:39
Does your cluster have enough memory to process this dataset? — Vijay Innamuri, Jun 16 '15 at 11:27
Make sure your `spark.memory.fraction=0.6` . If it is higher than that you run into garbage collection errors, see https://stackoverflow.com/a/47283211/179014 — asmaier, Nov 14 '17 at 10:24

score 3 · Answer 1 · answered Jun 16 '15 at 11:55

Only solution is to fine tune the configuration.

As per my experience I can say the following points for OOM:

Still if you need to cache then consider then analyze the data and application with respect to resources.

If your cluster has enough memory then increase the spark.executor.memory to its max
Increase the no of partitions to increase the parallelism
Increase the dedicated memory for caching spark.storage.memoryFraction. If lot of shuffle memory is involved then try to avoid or split the allocation carefully
Spark's caching feature Persist(MEMORY_AND_DISK) is available at the cost of additional processing (serializing, writing and reading back the data). Usually CPU usage will be too high in this case

score 0 · Answer 2 · answered Jun 15 '15 at 19:53

0

You can try increasing the driver-memory. If you don't have enough memory may be you can reduce it from executor-memory
Check the spark-ui to see what is the scheduler delay. You can access the spark UI on port 4040. If the scheduler delay is high, quite often, the driver may be shipping large amount of data to the executors. Which needs to be fixed.

answered Jun 15 '15 at 19:53

SanS

I already tried increasing the driver-memory but no joy. There is no schedule delay. The job starts running within 5-10 seconds. – diplomaticguru Jun 16 '15 at 10:41

2 Answers2