I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly.
eg.
75GB took 1h
86GB took 5h
I also see average shuffle time increase 10 fold
eg.
75GB 4min
85GB 41min
Can someone point me a direction to debug this?