My understanding was that spark will choose the 'default' number of partitions, solely based on the size of the file or if its a union of many parquet files, the number of parts.
However, in reading in a set of large parquet files, I see the that default # of partitions for an EMR cluster with a single d2.2xlarge is ~1200. However, in a cluster of 2 r3.8xlarge I'm getting default partitions of ~4700.
What metrics does Spark use to determine the default partitions?
EMR 5.5.0