I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead limit exceeded".
The job is trying to process a filesize 4.5G.
I've tried following spark configuration:
--num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G
I tried increasing more cores and executor which sometime works, but takes over 20 minutes to process the file.
Could I do something to improve the performance? or stop the Java Heap issue?