I have data that is partitioned as statically partitioned by data and dynamically partitioned by country. So for each date, I could have as much as 180 country partitions. Looks something like this:
/20180101/cntry=us/ => 100kb
/cntry=ca/ => 500kb
/cntry=uk/ => 1.5mb
For each date, the data is small (around 20-100mb) and it is divided among the country partitions. I was wondering for a situation like this, which method would be better? Repartition or coalesce? Since the data is small, would coalesce be better? I am very confused as to when coalesce or repartition would be a better choice depending on the size of the data.