I am writing a partitioned output using the below script.
.write
.format("csv")
.partitionBy("date","region")
.option("delimiter", "\t")
.mode("overwrite")
.save("s3://mybucket/myfolder/")
However this results in 1 file under each partition. I would like to have multiple similar sized files under each partition. How can I achieve the same. I am on spark 2.2.
I tried using additional key as part of repartition like df_input_table.repartition($"region",$"date",$"region")
. However that leads in different sized files.
I would like to stick to spark (instead of Hive).