Spark jdbc overwrite mode not working as expected

Question

I would like to perform update and insert operation using spark please find the image reference of existing table

Here i am updating id :101 location and inserttime and inserting 2 more records:

and writing to the target with mode overwrite

df.write.format("jdbc")
  .option("url",  "jdbc:mysql://localhost/test")
  .option("driver","com.mysql.jdbc.Driver")
  .option("dbtable","temptgtUpdate")
  .option("user", "root")
  .option("password", "root")
  .option("truncate","true")
  .mode("overwrite")
  .save()

After executing the above command my data is corrupted which is inserted into db table

Data in the dataframe

Could you please let me know your observations and solutions

vaquar khan · Answer 1 · 2020-05-10T04:46:46.270

3

Spark JDBC writer supports following modes:

append: Append contents of this :class:DataFrame to existing data.
overwrite: Overwrite existing data.
ignore: Silently ignore this operation if data already exists.
error (default case): Throw an exception if data already exists
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Since you are using "overwrite" mode it recreate your table as per then column length, if you want your own table definition create table first and use "append" mode

edited May 10 '20 at 04:46

answered May 10 '20 at 01:51

vaquar khan

10,864
5
72
96

i would like you check the first image there i already have a table with data i want to update records in the table – TEJASWAKUMAR May 10 '20 at 01:58
is it possible you can share dummy df in your question to recreate issue , if data already in table should be update – vaquar khan May 10 '20 at 04:48

Chris · Answer 2 · 2020-05-10T11:21:24.503

1

i would like to perform update and insert operation using spark

There is no equivalent in to SQL UPDATE statement with Spark SQL. Nor is there an equivalent of the SQL DELETE WHERE statement with Spark SQL. Instead, you will have to delete the rows requiring update outside of Spark, then write the Spark dataframe containing the new and updated records to the table using append mode (in order to preserve the remaining existing rows in the table).

edited May 10 '20 at 11:21

answered May 10 '20 at 08:47

Chris

1,335
10
19

Then what's the functionality of option truncate and mode overwrite . – TEJASWAKUMAR May 10 '20 at 11:01
I have clarified my answer around SQL Delete. For the Spark JDBC write modes you should refer to the documentation. – Chris May 10 '20 at 11:23
There doesn't seem to be a 'truncate' option in the documentation https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframewriter#pyspark.sql.DataFrameWriter. – Chris Feb 10 '21 at 14:19

Yuva · Answer 3 · 2020-05-11T05:26:13.583

1

In case where you need to perform UPSERT / DELETE operations in your pyspark code, i suggest you to use pymysql libary, and execute your upsert/delete operations. Please check this post for more info, and code sample for reference : Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array

Please modify the code sample as per your needs.

edited May 11 '20 at 05:26

answered May 11 '20 at 05:12

Yuva

2,831
7
36
60

score -1 · Answer 4 · edited Feb 11 '21 at 09:11

-1

Upsert logic is working fine when following below steps

  df = (spark.read.format("csv").
        load("file:///C:/Users/test/Desktop/temp1/temp1.csv", header=True,
             delimiter=','))

and doing this

  (df.write.format("jdbc").
    option("url", "jdbc:mysql://localhost/test").
    option("driver", "com.mysql.jdbc.Driver").
    option("dbtable", "temptgtUpdate").
    option("user", "root").
    option("password", "root").
    option("truncate", "true").
    mode("overwrite").save())

Still, I am unable to understand the logic why its failing when i am writing using the data frame directly

edited Feb 11 '21 at 09:11

Chris

1,335
10
19

answered May 11 '20 at 04:58

TEJASWAKUMAR

85
1
3
8

This just has the effect of truncating the table and inserting the rows from the dataframe. There is no update behaviour here. – Chris Feb 11 '21 at 10:02
dbtable need dbname.tbname – geosmart May 11 '21 at 13:32

score -1 · Answer 5 · answered May 11 '20 at 05:14

-1

I wouldn't recommend TRUNCATE, since it would actually drop the table, and create new table. While doing this, the table may lose column level attributes that were set earlier...so be careful while using TRUNCATE, and be sure, if it's ok for dropping the table/recreate the table.

answered May 11 '20 at 05:14

Yuva

2,831
7
36
60

It's actually the other way around, the `"truncate"` option avoids dropping the table. Here's the reference from the documentation: > "This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed." https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html – Lucas Sousa Aug 23 '22 at 18:14

Spark jdbc overwrite mode not working as expected

5 Answers5