3

I have my tables stored in MySQL with ID as primary key. I want to write using Spark to Mysql wherein it ignores the rows in dataframe which already exists in Mysql (based on primary key) and only writes the new set of rows.

ID (PK) | Name | Description

    1   |   A  | Something1
    2   |   B  | Something2

When I try writing a Spark Dataframe (containing ID 3 and 4) into this table using the following with SaveMode.Append :

val prop = new java.util.Properties
prop.setProperty("driver", "com.mysql.jdbc.Driver")
prop.setProperty("user", "username")
prop.setProperty("password", "XXX")



dataFrame
  .write
  .mode(SaveMode.Append)
  .jdbc(jdbc:mysql://XXXXX:3306/db_name,"table_name",prop)

The following is the result I achieve:

ID (PK) | Name | Description

    1   |   A  | Something1
    2   |   B  | Something2
    3   |   C  | Something3
    4   |   D  | Something4

But when I use SaveMode.Ignore (with IDs 4, 5, 6) the entire dataframe write gets ignored instead of just ID 4 getting ignored.

Is this behaviour normal? What if I want to store ID 5 and 6 when the dataframe consists of ID 4, 5 and 6? Is there any other recommended way to do it?

zero323
  • 322,348
  • 103
  • 959
  • 935

0 Answers0