0

I recently implemented a change where my Spark job reads a fraction of the original data needed. When I test, this change shows up as the shuffle read sizes are substantially reduced. However, the job consistently takes ~30% longer than production, which I attribute to longer executor compute times on the event timeline. (The shuffle read time is much shorter than production)

What can I do to improve the time performance?

user3746406
  • 81
  • 10

0 Answers0