When I write data from dataframe into parquet table ( which is partitioned ) after all the tasks are successful, process is stuck at updating partition stats.
16/10/05 03:46:13 WARN log: Updating partition stats fast for:
16/10/05 03:46:14 WARN log: Updated size to 143452576
16/10/05 03:48:30 WARN log: Updating partition stats fast for:
16/10/05 03:48:31 WARN log: Updated size to 147382813
16/10/05 03:51:02 WARN log: Updating partition stats fast for:
df.write.format("parquet").mode("overwrite").partitionBy(part1).insertInto(db.tbl)
My table has > 400 columns and > 1000 partitions. Please let me know if we can optimize and speedup updating partition stats.