DataFrame numPartitions default value

Question

I'm trying to read something from a Database table using JDBC:

val df = spark.read.jdbc("<database url>", "<some table name>", <some DbProperties>)

and then write it to another database:

df.write.mode(SaveMode.Append).jdbc("<other database url>", "<same table name>", <some DbProperties>)

If we do not specify numPartitions option in the Db Properties, what will be the default value for numPartitions Spark uses to read the table from Database into df?
If I want to write the above df into another table of another database, if I still don't specify numPartitions, will there be parallel connections created while writing to the Database?
Suppose while reading I have given numPartitions as 8, while writing this df onto the target DB, will the numPartitions = 8 still be valid without me explicitly specifying it while writing?

possible duplicate of https://stackoverflow.com/questions/43150694/partitioning-in-spark-while-reading-from-rdbms-via-jdbc — sathya, Jul 09 '20 at 18:21

sathya · Answer 1 · 2020-07-09T18:30:57.647

3

If you don't specify either {partitionColumn, lowerBound, upperBound, numPartitions} or {predicates} Spark will use a single executor and create a single non-empty partition. All data will be processed using a single transaction and reads will be neither distributed nor parallelized.

See also:

Please check the spark docs for more information on spark JDBC integration

https://spark.apache.org/docs/2.3.2/sql-programming-guide.html#jdbc-to-other-databases

edited Jul 09 '20 at 18:30

answered Jul 09 '20 at 18:23

sathya

1,982
1
20
37

Ok, suppose I gave `numPartitions` while reading the `DataFrame`, will the same `numPartitions` be used while writing the same `df` (without explicitly specifying it while writing) (Qn. #3)? – Sparker0i Jul 09 '20 at 19:54
When you run df.write, each of the original partitions in df is written independently. but if you also add ```numPartitions``` as a best practice. – sathya Jul 10 '20 at 07:02
Ok so that means the same `numPartitions` is retained while writing to the database without me explicitly specifying it, correct? – Sparker0i Jul 10 '20 at 11:22

DataFrame numPartitions default value

1 Answers1