I have 2 questions:
Can we have less partitions set in a call to
coalesce
than the HDFS block size? e.g. Suppose I have a 1 GB file size and HDFS block size is 128MB, can I docoalesce(1)
?As we know, input files on HDFS are physically split on the basis of block size. Does Spark further split the data (physically) when we repartition, or change parallelism?