1

After reading up on this answer , i know the number of partitions when reading data from Hive will be decided by the HDFS blockSize.

But i meet a problem: i use spark sql to read a hive table, and save the data to an new hive table, but the two hive tables have different partition numbers when loaded by spark sql.

  val data = spark.sql("select * from src_table")
  val partitionsNum = data.rdd.getNumPartitions
  println(partitionsNum)
  val newData = data
newData.write.mode("overwrite").format("parquet").saveAsTable("new_table")

I don't understand the same data, why different partition numbers.

cat
  • 11
  • 2
  • Spark SQL doesn't depend on Hadoop configuration anymore: [How to change partition size in Spark SQL](https://stackoverflow.com/q/38249624/9613318). And there is not enough detail here. What is format of the input? How has it been stored? – Alper t. Turker May 15 '18 at 09:20

0 Answers0