2

I've got an MS Access table (SearchAdsAccountLevel) which needs to be updated frequently from a python script. I've set up the pyodbc connection and now I would like to UPDATE/INSERT rows from my pandas df to the MS Access table based on whether the Date_ AND CampaignId fields match with the df data.

Looking at previous examples I've built the UPDATE statement which uses iterrows to iterate through all rows within df and execute the SQL code as per below:

    connection_string = (
            r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"
            r"c:\AccessDatabases\Database2.accdb;"
    )
    cnxn = pyodbc.connect(connection_string, autocommit=True)
    crsr = cnxn.cursor()

    for index, row in df.iterrows():
            crsr.execute("UPDATE SearchAdsAccountLevel SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?, [AppName]=?, [AppId]=?, [TotalBudgetAmount]=?, [TotalBudgetCurrency]=?, [DailyBudgetAmount]=?, [DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?, [ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?, [LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?, [Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?, [RowUpdatedTime]=? WHERE [Date_]=? AND [CampaignId]=?",
                        row['OrgId'],
                        row['CampaignName'],
                        row['CampaignStatus'],
                        row['Storefront'],
                        row['AppName'],
                        row['AppId'],
                        row['TotalBudgetAmount'],
                        row['TotalBudgetCurrency'],
                        row['DailyBudgetAmount'],
                        row['DailyBudgetCurrency'],
                        row['Impressions'],
                        row['Taps'],
                        row['Conversions'],
                        row['ConversionsNewDownloads'],
                        row['ConversionsRedownloads'],
                        row['Ttr'],
                        row['LocalSpendAmount'],
                        row['LocalSpendCurrency'],
                        row['ConversionRate'],
                        row['Week_'],
                        row['Month_'],
                        row['Year_'],
                        row['Quarter'],
                        row['FinancialYear'],
                        row['RowUpdatedTime'],
                        row['Date_'],
                        row['CampaignId'])
crsr.commit()

I would like to iterate through each row within my df (around 3000) and if the ['Date_'] AND ['CampaignId'] match I UPDATE all other fields. Otherwise I want to INSERT the whole df row in my Access Table (create new row). What's the most efficient and effective way to achieve this?

Dylan_w
  • 472
  • 5
  • 19
Mtra
  • 201
  • 4
  • 10

1 Answers1

3

Consider DataFrame.values and pass list into an executemany call, making sure to order columns accordingly for the UPDATE query:

cols = ['OrgId', 'CampaignName', 'CampaignStatus', 'Storefront',
        'AppName', 'AppId', 'TotalBudgetAmount', 'TotalBudgetCurrency',
        'DailyBudgetAmount', 'DailyBudgetCurrency', 'Impressions',
        'Taps', 'Conversions', 'ConversionsNewDownloads', 'ConversionsRedownloads',
        'Ttr', 'LocalSpendAmount', 'LocalSpendCurrency', 'ConversionRate',
        'Week_', 'Month_', 'Year_', 'Quarter', 'FinancialYear',
        'RowUpdatedTime', 'Date_', 'CampaignId']

sql = '''UPDATE SearchAdsAccountLevel 
            SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?, 
                [AppName]=?, [AppId]=?, [TotalBudgetAmount]=?, 
                [TotalBudgetCurrency]=?, [DailyBudgetAmount]=?, 
                [DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?, 
                [ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?, 
                [LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?,
                [Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?, 
                [RowUpdatedTime]=? 
          WHERE [Date_]=? AND [CampaignId]=?'''

crsr.executemany(sql, df[cols].values.tolist())   
cnxn.commit()

For the insert, use a temp, staging table with exact structure as final table which you can create with make-table query: SELECT TOP 1 * INTO temp FROM final. This temp table will be regularly cleaned out and inserted with all data frame rows. The final query migrates only new rows from temp into final with NOT EXISTS, NOT IN, or LEFT JOIN/NULL. You can run this query anytime and never worry about duplicates per Date_ and CampaignId columns.

# CLEAN OUT TEMP
sql = '''DELETE FROM SearchAdsAccountLevel_Temp'''
crsr.executemany(sql)   
cnxn.commit()

# APPEND TO TEMP
sql = '''INSERT INTO SearchAdsAccountLevel_Temp (OrgId, CampaignName, CampaignStatus, Storefront,
                                AppName, AppId, TotalBudgetAmount, TotalBudgetCurrency,
                                DailyBudgetAmount, DailyBudgetCurrency, Impressions,
                                Taps, Conversions, ConversionsNewDownloads, ConversionsRedownloads,
                                Ttr, LocalSpendAmount, LocalSpendCurrency, ConversionRate,
                                Week_, Month_, Year_, Quarter, FinancialYear,
                                RowUpdatedTime, Date_, CampaignId)    
         VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, 
                 ?, ?, ?, ?, ?, ?, ?, ?, ?, 
                 ?, ?, ?, ?, ?, ?, ?, ?, ?);'''

crsr.executemany(sql, df[cols].values.tolist())   
cnxn.commit()

# MIGRATE TO FINAL
sql = '''INSERT INTO SearchAdsAccountLevel 
         SELECT t.* 
         FROM SearchAdsAccountLevel_Temp t
         LEFT JOIN SearchAdsAccountLevel f
            ON t.Date_ = f.Date_ AND t.CampaignId = f.CampaignId
         WHERE f.OrgId IS NULL'''
crsr.executemany(sql)   
cnxn.commit()
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • how can I also integrate an insert query to create rows which are not currently present in my table but are in df? – Mtra May 02 '19 at 18:32
  • That is a different question than this as it involves a different query and data handling. Usually, staging temp tables are used to avoid append duplicates with [`NOT IN, NOT EXISTS, LEFT JOIN/IS NULL`](https://stackoverflow.com/a/2513187/1422451) clauses against final tables in an `INSERT...SELECT` call. – Parfait May 02 '19 at 18:40
  • +1 for the `df[cols].values` hint. To be fair, the "insert" aspect was part of the question (although buried near the bottom), and that is covered by [this answer](https://stackoverflow.com/a/43919138/2144390) via a comment to [the dup](https://stackoverflow.com/q/55901296/2144390). – Gord Thompson May 03 '19 at 00:45
  • @Mtra, see edit with `INSERT` query section which as mentioned requires use of a temp table. – Parfait May 03 '19 at 13:36
  • Instead of creating a temp table, can I run an Insert statement with ON DUPLICATE KEY UPDATE on the Date_ and CampaignId fields? – Mtra May 06 '19 at 09:33
  • MS Access does not support such an `INSERT` statement though other DBMS' may. Also, you only need to create temp table once. – Parfait May 06 '19 at 12:43
  • When trying the #Append to Temp step I'm getting the following error: DataError: ('22018', '[22018] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. (-3030) (SQLExecDirectW)') – Mtra May 07 '19 at 09:28
  • 1
    One or more of your data frame column types do not match database table types. Try importing a `select` query of few rows with `read_sql` and compare dtypes with your current df. – Parfait May 07 '19 at 12:32
  • Yeah you were right. I had some 'None' values which weren't being recognized. I've managed to update my final table and insert data into the temp table. Now when trying to migrate temp table to final table I'm getting the following: IntegrityError: ('23000' [Microsoft][ODBC Microsoft Access Driver] The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again.) – Mtra May 08 '19 at 09:47
  • I thought this was happening due to having duplicates in the index column ('ID') which auto increments in MS Access. I tried to workaround this by omitting this column in the INSERT and SELECT statements but I'm still getting the same error. – Mtra May 08 '19 at 09:50
  • 1
    You need to explicitly name the columns in last append query's `INSERT` and `SELECT` clauses. For latter, do not use the abbreviated `*`. – Parfait May 08 '19 at 12:25