Need to merge small parquet files. I have multiple small parquet files in hdfs. I like to combine those parquet files each to nearly 128 mb each 2. So I read all the files using spark.read() And did repartition() on that and write to the hdfs location
My issue is I have approx 7.9 GB of data, when I did repartition and saved to hdfs it is getting nearly to 22 GB.
I had tied with repartition , range , colasce but not getting the solution