1

I would like to upsert my pandas DataFrame into a SQL Server table. This question has a workable solution for PostgreSQL, but T-SQL does not have an ON CONFLICT variant of INSERT. How can I accomplish the same thing for SQL Server?

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418

1 Answers1

5

Update, July 2022: You can save some typing by using this function to build the MERGE statement and perform the upsert for you.


SQL Server offers the MERGE statement:

import pandas as pd
import sqlalchemy as sa

connection_string = (
    "Driver=ODBC Driver 17 for SQL Server;"
    "Server=192.168.0.199;"
    "UID=scott;PWD=tiger^5HHH;"
    "Database=test;"
    "UseFMTONLY=Yes;"
)
connection_url = sa.engine.URL.create(
    "mssql+pyodbc",
    query={"odbc_connect": connection_string}
)

engine = sa.create_engine(connection_url, fast_executemany=True)

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )

    # step 1 - upload DataFrame to temporary table
    df.to_sql("#temp_table", conn, index=False, if_exists="replace")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        MERGE main_table WITH (HOLDLOCK) AS main
        USING (SELECT id, txt FROM #temp_table) AS temp
        ON (main.id = temp.id)
        WHEN MATCHED THEN
            UPDATE SET txt = temp.txt
        WHEN NOT MATCHED THEN
            INSERT (id, txt) VALUES (temp.id, temp.txt);
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql(
        "SELECT * FROM main_table ORDER BY id"
    ).fetchall()
    print(result)  
    # [(1, 'row 1 new text'), (2, 'new row 2 text')]
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • 1
    For an example that can be used with a compound (multi-column) primary key see [this answer](https://stackoverflow.com/a/54612248/2144390). – Gord Thompson Sep 24 '20 at 20:16
  • I'm trying to replicate step 1 in my current use case: I'm creating the sqlalchemy engine like so: `sa.create_engine("ibm_db_sa+pyodbc://?driver=IBM i Access ODBC Driver&SYSTEM=XXX&;Port=21&UID=XXX&PWD=XXX&Database=")` Then executing step 1: `df1.to_sql("WWNEXPORT.TEMP", engine, index=False, if_exists="replace")` But I receive the following error: `sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S02', '[42S02] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0204 - TABLES of type *FILE in SYSCAT not found. (-204) (SQLPrepare)')` Do you maybe know why? – TheDude Jan 13 '22 at 14:55
  • @TheDude - Maybe try `df1.to_sql("TEMP", engine, schema="WWNEXPORT", index=False, if_exists="replace")` – Gord Thompson Jan 13 '22 at 16:06
  • Unfortunately, this isn't the solution and I get the same error. This is some additional information that comes with the error and that I couldn't post in the comment above due to the limitation of characters for comments: `[SQL: SELECT "SYSCAT"."TABLES"."TABNAME" FROM "SYSCAT"."TABLES" WHERE "SYSCAT"."TABLES"."TABSCHEMA" = ? AND "SYSCAT"."TABLES"."TABNAME" = ?] [parameters: ('WWNEXPORT', 'TEMP')] (Background on this error at: https://sqlalche.me/e/14/f405)` I don't understand how and why this SQL statement is generated. – TheDude Jan 13 '22 at 17:44
  • @TheDude - pandas `to_sql()` is calling SQLAlchemy `has_table()` to see if the table already exists, so SQLAlchemy is querying the SYSCAT (metadata) tables to see if your table shows up there. I have no experience with ibm_db_sa, unfortunately. – Gord Thompson Jan 13 '22 at 19:33
  • I checked the database for the SYSCAT table but didn't find anything. You don't have any experience working with DB2 as/400 database by any chance? – TheDude Jan 17 '22 at 08:08