1

I am copying data from Excel to SQL tables using Talend. Before copying I need to check for Primary key's column. If the key is already existing in the database I need to update that record or else insert that record in the database and it will get assigned an auto incremented key.

How do I do this?

ydaetskcoR
  • 53,225
  • 8
  • 158
  • 177
Quick-gun Morgan
  • 338
  • 12
  • 31
  • What RDBMS are you using? If I remember rightly, you were using SQL Server before but it might be useful to clarify it here by adding the appropriate RDBMS tag to your question. I've put the sql-server tag on it for now though. Correct this if I'm wrong. – ydaetskcoR Jul 17 '14 at 16:41
  • Yes, I am using sql-server only. Thanks. – Quick-gun Morgan Jul 17 '14 at 20:13

2 Answers2

0

You can use the easy but slow way and simply use either the INSERT OR UPDATE or the UPDATE OR INSERT (the difference being whether the update or insert is tried first) option in the txxxSqlOutput component. This is supported by most of the RDBMS components including the SQL Server one:

SQL Server Upsert options

If you are expecting more updates than inserts then the Update or insert option would be better performing and vice versa.

Alternatively, you can query the database beforehand and then inner join that on the primary key with your data flow to be added to the database. The matches are obviously updates and the rejects (enable Catch inner join reject on another data flow in the tMap) are then obviously inserts.

Faster way to upsert data

With the tMap being configured like so:

tMap configuration to allow for more performant upserts

The second way can be much more performant, especially when upserting large data sets as the first option has to attempt to insert each line, line by line, and then if it fails it then rewrites it as an update statement.

One other benefit of the second is that it can also be contained in a single batch/transaction as you are no longer having to deliberately catch a failed insert/update and then update/insert instead. This means you can make your job a lot more robust and handle errors more appropriately.

ydaetskcoR
  • 53,225
  • 8
  • 158
  • 177
0

Actually i was working on the same problem a few days ago and this is how i worked:

  1. Loaded all data from excel to sql temporary tables (The tempory tables must have the same structure as the tables in the database and must be created in the same database)
  2. Create a stored procedure that compare primarykey of the tempory table with the database one (Insert if doesn't exist else update)
  3. Called the stored procedure in ETL task.

Example of the stored procedure:

   CREATE PROCEDURE [dbo].[up_ImportProductsData]
    AS
    BEGIN
      @Product_PK int,     //Product primary key
      @Product_Name NVarchar(30) // Product name

    // we create a cursor on product temporay table
    DECLARE c_Product CURSOR FOR
    SELECT Product_PK,
           Product_NAME,
    FROM Product_tempory_table

    //We open the cursor to check product pk one by one
    OPEN c_Product CURSOR
    FETCH c_Product CURSOR INTO 
    @Product_PK, @Product_Name 
    WHILE @@fetch_status=0

    //Checking if the Product_PK exits
    BEGIN

    //If the PRimary key doesnt exists then we insert it
     IF NOT EXISTS(SELECT * FROM dbo.Product_Table
    WHERE Product_PK=@Product_PK)
    BEGIN
            INSERT INTO dbo.Product_Table
                    ( 
                       Product_PK,
                       Product_NAME
                    )
            VALUES  ( 

                      @Product_PK, @Product_Name
                    )  
       END

    //If The primary key exists we update it
    ELSE
      BEGIN
         UPDATE dbo.Product_Table
         SET    Product_NAME=@Product_Name 

         WHERE Product_PK= @Product_PK

      END
    FETCH c_Product INTO 
    @Product_PK, @Product_Name 
    END
    CLOSE c_Product
    DEALLOCATE c_Product 

END
ASNAOUI Ayoub
  • 462
  • 2
  • 9
  • This is truly hideous, I'm sorry. Why the hell are you using a cursor to iterate through that list of data when you have an ETL program (as mentioned in step 3) that can do that much easier and more transparently. You also shouldn't be putting this much logic into the database when you have an ETL tool/process available to you . – ydaetskcoR Jul 17 '14 at 16:34
  • By the looks of things this will also then do single line inserts and commits which could easily be done with an UPSERT style statement such as http://stackoverflow.com/a/108416/2291321 in SQL Server or `INSERT ... ON DUPLICATE KEY UPDATE` in MySQL or just use the INSERT OR UPDATE option in the Output component. – ydaetskcoR Jul 17 '14 at 16:39
  • Sorry for the late reply. Yes, I might be requiring this sort of approach instead of directly inserting/updating the tables. My product uses actual tables so if something goes wrong during the insertion I don't want my product to stop working. And that's why I guess this approach of creating temporary tables would be better to go with. – Quick-gun Morgan Jul 17 '14 at 19:07