I have a basic spark job that does a couple of joins. The 3 data frames that get joined are somewhat big, nearly 2 billion records in each of them. I have a spark infrastructure that automatically scales up nodes whenever necessary. It seems like a very simple spark SQL query whose results I write to disk. But the job always gets stuck at 99% when I look at from Spark UI.
Bunch of things I have tried are:
- Increase the number of
executors
andexecutor memory
. - Use
repartition
while writing the file. - Use the native spark
join
instead ofspark SQL join
etc
However, none of these things have worked. It would be great if somebody can share the experience of solving this problem. Thanks in advance.