Error Cannot insert duplicate key row in object while loading data into Azure SQL DB from Azure Databricks

Question

I'm using the following simple line of code in to load data to SQL DB from Azure Databricks

val loadsqldb = spark.sql("""SELECT * FROM TABLEA""")
// WRITE FROM CONFIG
val writeConfig = Config(Map(
  "url"          -> url,
  "databaseName" -> databaseName,
  "dbTable"      -> "dbo.TABLENAME",
  "user"         -> user,
  "password"     -> password,
  "connectTimeout" -> "5"
))

//~
loadsqldb.write.mode(SaveMode.Overwrite).option("truncate", true).sqlDB(writeConfig)

We have a unique ID key on our server which we MUST retain which looks like the following:

CREATE UNIQUE INDEX i__NeighbourhoodCategoryHourlyForecast_unique
ON almanac_devdb.dbo.NeighbourhoodCategoryHourlyForecast (fk_neighbourhoods, fk_categories, local_date, local_hour)
GO

When I try to load data to our SQL DB we get the following error;

Cannot insert duplicate key row in object 'dbo.TABLENAME' with unique index 'i__TABLENAME_unique'. The duplicate key value is (36983, 130000, 2020-08-12, 14).

It was suggested that I found someway to get Databricks to do a MERGE into of OVERWRITE, but I'm not sure how to do that or even if that is the correct thing to do?

score 1 · Accepted Answer · answered Aug 15 '20 at 06:40

1

I would suggest to follow below steps (Don't change anything on spark side, instead execute below steps at sql server side)-

Create view on top of target_table where you wanted to write the spark dataframe data
Create INSTEAD OF INSERT Trigger in such a way that all insert commands should go via a view created in step#1

CREATE TRIGGER <trigger_name> 
ON <view_created_in_step_1>
INSTEAD OF INSERT
AS
BEGIN
    Merge statment...
   
END

Merge statement used to insert and update the singe statement. Follow this tutorial.

you may also want to take a look at this tutorial for the transaction related query to update or insert a table if the key (not) matches

answered Aug 15 '20 at 06:40

Som

6,193
1
11
22

thanks for reaching out. However, I don't understand your suggestion will work by making the change on the SQL Server? This is because I loading the data from databricks. Surely the change needs to be on Databricks? – Carltonp Aug 15 '20 at 13:37
The error popping up is related to some primary bkey/unique constraints. This is because when you say `df.write.`, by default spark will try to invoke `insert` statement on the mentioned target table. To override that blind insert statement there is a mechanism known as `insted of insert` trigger where you can mention your own activity (like use upsert when insert is invoked) – Som Aug 16 '20 at 01:46
thanks for reaching out again. Therefore, I will try and make the change on my SQL Server. If you have any additional examples that would be much appreciated. If not, I'll try and work with the samples you provided – Carltonp Aug 16 '20 at 11:42
the link you mentioned for the Merge statement tutorial doesn't have any information on Merging data – Carltonp Aug 16 '20 at 11:55
I also looked at the following example https://www.tutorialgateway.org/instead-of-insert-triggers-in-sql-server/ .Is it really necessary to create a temporary table? – Carltonp Aug 16 '20 at 12:40
Please feel free to update this solution as `update-1` saying what exactly you did – Som Aug 19 '20 at 03:26

score 0 · Answer 2 · answered Aug 15 '20 at 05:28

0

Drop duplicates for the unique index key.

df.dropDuplicates(Array("col1","col2"))

After this, try to write into your database.

answered Aug 15 '20 at 05:28

Lamanus

12,898
4
21
47

You can't use this. Think a scenario where you have no duplicate in the dataframe but the key you are trying to insert is already present in the DB. – Som Aug 15 '20 at 05:50
@Someshwar, you're correct where you say 'have no duplicate in the dataframe', however I'm still unsure how making the change on the SQL Server will fix this? – Carltonp Aug 15 '20 at 13:39
The `INSTEAD OF TRIGGERS` will help you to override the default insert. Basically here we want to insert if key is not available and update if its available. you may wanted to gain more info here- https://www.sqlservertutorial.net/sql-server-triggers/sql-server-instead-of-trigger – Som Aug 15 '20 at 15:38

Error Cannot insert duplicate key row in object while loading data into Azure SQL DB from Azure Databricks

2 Answers2