I'm running a spark job on EMR with Spark 1.6 and as shown below there is enough memory available on the executors.
Even though there is quite a lot of memory available, I see the below where shuffle spills to disk. What I'm attempting to do is a join and I'm joining the three datasets using dataframes api's
I did look at the documentation and also played around with "spark.memory.fraction", and "spark.memory.storageFraction", but that does not seem to help.
Any help will be greatly appreciated. Thanks