After reading up on this answer , i know the number of partitions when reading data from Hive will be decided by the HDFS blockSize.
But i meet a problem: i use spark sql to read a hive table, and save the data to an new hive table, but the two hive tables have different partition numbers when loaded by spark sql.
val data = spark.sql("select * from src_table")
val partitionsNum = data.rdd.getNumPartitions
println(partitionsNum)
val newData = data
newData.write.mode("overwrite").format("parquet").saveAsTable("new_table")
I don't understand the same data, why different partition numbers.