I am trying to use SparkSQL to export my database to my S3 in Parquet format.
One of my tables contains row size > 2GB. The Spark was submitted with --conf spark.executor.memory=21g --conf spark.executor.memoryOverhead=9g --conf spark.executor.cores=8
.
It seems there is a limitation from Spark: Maximum size of rows in Spark jobs using Avro/Parquet. But not sure if it’s the case.
Is there a workaround for that?