4

I would like to perform update and insert operation using spark please find the image reference of existing table

existing table

Here i am updating id :101 location and inserttime and inserting 2 more records:

enter image description here

and writing to the target with mode overwrite

df.write.format("jdbc")
  .option("url",  "jdbc:mysql://localhost/test")
  .option("driver","com.mysql.jdbc.Driver")
  .option("dbtable","temptgtUpdate")
  .option("user", "root")
  .option("password", "root")
  .option("truncate","true")
  .mode("overwrite")
  .save()

After executing the above command my data is corrupted which is inserted into db table

enter image description here

Data in the dataframe

enter image description here

Could you please let me know your observations and solutions

Chris
  • 1,335
  • 10
  • 19
TEJASWAKUMAR
  • 85
  • 1
  • 3
  • 8

5 Answers5

3

Spark JDBC writer supports following modes:

Since you are using "overwrite" mode it recreate your table as per then column length, if you want your own table definition create table first and use "append" mode

vaquar khan
  • 10,864
  • 5
  • 72
  • 96
  • i would like you check the first image there i already have a table with data i want to update records in the table – TEJASWAKUMAR May 10 '20 at 01:58
  • is it possible you can share dummy df in your question to recreate issue , if data already in table should be update – vaquar khan May 10 '20 at 04:48
1

i would like to perform update and insert operation using spark

There is no equivalent in to SQL UPDATE statement with Spark SQL. Nor is there an equivalent of the SQL DELETE WHERE statement with Spark SQL. Instead, you will have to delete the rows requiring update outside of Spark, then write the Spark dataframe containing the new and updated records to the table using append mode (in order to preserve the remaining existing rows in the table).

Chris
  • 1,335
  • 10
  • 19
  • Then what's the functionality of option truncate and mode overwrite . – TEJASWAKUMAR May 10 '20 at 11:01
  • I have clarified my answer around SQL Delete. For the Spark JDBC write modes you should refer to the documentation. – Chris May 10 '20 at 11:23
  • There doesn't seem to be a 'truncate' option in the documentation https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframewriter#pyspark.sql.DataFrameWriter. – Chris Feb 10 '21 at 14:19
1

In case where you need to perform UPSERT / DELETE operations in your pyspark code, i suggest you to use pymysql libary, and execute your upsert/delete operations. Please check this post for more info, and code sample for reference : Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array

Please modify the code sample as per your needs.

Yuva
  • 2,831
  • 7
  • 36
  • 60
-1

Upsert logic is working fine when following below steps

  df = (spark.read.format("csv").
        load("file:///C:/Users/test/Desktop/temp1/temp1.csv", header=True,
             delimiter=','))

and doing this

  (df.write.format("jdbc").
    option("url", "jdbc:mysql://localhost/test").
    option("driver", "com.mysql.jdbc.Driver").
    option("dbtable", "temptgtUpdate").
    option("user", "root").
    option("password", "root").
    option("truncate", "true").
    mode("overwrite").save())

Still, I am unable to understand the logic why its failing when i am writing using the data frame directly

Chris
  • 1,335
  • 10
  • 19
TEJASWAKUMAR
  • 85
  • 1
  • 3
  • 8
-1

I wouldn't recommend TRUNCATE, since it would actually drop the table, and create new table. While doing this, the table may lose column level attributes that were set earlier...so be careful while using TRUNCATE, and be sure, if it's ok for dropping the table/recreate the table.

Yuva
  • 2,831
  • 7
  • 36
  • 60
  • It's actually the other way around, the `"truncate"` option avoids dropping the table. Here's the reference from the documentation: > "This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed." https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html – Lucas Sousa Aug 23 '22 at 18:14