I have a similar problem described in this SO post:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of [N] tasks is bigger than spark.driver.maxResultSize (1024.0 MB)
This worked previously on Glue 2.0 with Spark 2.4, and is breaking when I tried with Glue 3.0 and Spark 3.1.
Since I am not doing an explicit broadcast, I believe this is because Spark 3.1 is automatically converting the join to broadcast at runtime.
My question is: How do I track down and fix Spark's mis-judgement of size or statistics? The join is between two data frames, each of which are the result of joins or aggregations of other data frames so several layers removed from physical files in S3.