2

I'm trying to read something from a Database table using JDBC:

val df = spark.read.jdbc("<database url>", "<some table name>", <some DbProperties>)

and then write it to another database:

df.write.mode(SaveMode.Append).jdbc("<other database url>", "<same table name>", <some DbProperties>)
  1. If we do not specify numPartitions option in the Db Properties, what will be the default value for numPartitions Spark uses to read the table from Database into df?
  2. If I want to write the above df into another table of another database, if I still don't specify numPartitions, will there be parallel connections created while writing to the Database?
  3. Suppose while reading I have given numPartitions as 8, while writing this df onto the target DB, will the numPartitions = 8 still be valid without me explicitly specifying it while writing?
Sparker0i
  • 1,787
  • 4
  • 35
  • 60
  • possible duplicate of https://stackoverflow.com/questions/43150694/partitioning-in-spark-while-reading-from-rdbms-via-jdbc – sathya Jul 09 '20 at 18:21

1 Answers1

3

If you don't specify either {partitionColumn, lowerBound, upperBound, numPartitions} or {predicates} Spark will use a single executor and create a single non-empty partition. All data will be processed using a single transaction and reads will be neither distributed nor parallelized.

See also:

Please check the spark docs for more information on spark JDBC integration

sathya
  • 1,982
  • 1
  • 20
  • 37
  • Ok, suppose I gave `numPartitions` while reading the `DataFrame`, will the same `numPartitions` be used while writing the same `df` (without explicitly specifying it while writing) (Qn. #3)? – Sparker0i Jul 09 '20 at 19:54
  • When you run df.write, each of the original partitions in df is written independently. but if you also add ```numPartitions``` as a best practice. – sathya Jul 10 '20 at 07:02
  • Ok so that means the same `numPartitions` is retained while writing to the database without me explicitly specifying it, correct? – Sparker0i Jul 10 '20 at 11:22