1

Let's say I have a file of 1.2 GB, so considering the block size of 128 MB, it would create 10 partitions. Now, if I repartition it (or coalesce) to 4 partitions, it means definitely each partition will be more than 128 MB. In this case, each partition has to hold 320 MB of data, but block size is 128 MB. I'm bit confused here. How is this possible? How can we create a partition with more than block size?

1 Answers1

2

Blocks have a fixed size and are physical chunks of data that are saved in specific locations on your cluster or machine. Partitions are just logical divisions of data, independent from the physical location.

For a more thorough explanation, see my answer to “are files divided into blocks for storing in HDFS?

user2314737
  • 27,088
  • 20
  • 102
  • 114