I just wanted to understand why the spark repartition increase data volume ?. When the same operation I did with Coalesce , It showed me correct size. When I did repartition with 100GB of data It became around 400GB (more than that).
Here is my code which do repartition
SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("spark compatation");
JavaSparkContext sc = new JavaSparkContext(conf);
SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
String partition = "file='hit_data'";
spark.read()
.format("delta")
.load("delta-table/clickstream/")
// .where(partition)
.repartition(10)
.write()
.format("delta")
.mode("overwrite")
//.option("replaceWhere", partition)
.save("delta-table/clickstream/");
spark.stop();
sc.close();