Update on SQL Server table from Python Pandas

Question

Following is the code in python that updates the records in the required database tables. Is there a better way to handle the same?

Read in SO that scanning dataframe row by row is a time consuming process. What is the better way to handle the same?

for index, row in outputData.iterrows():
    try:
        updatesql = " update table set [fieldname] = {0:f}   where dt = \'{1:s}\'" .format(fieldvalue , currentdt)
        updatecursor.execute(updatesql)
        sql_conn.commit();
except IOError as e:
            print ("({})".format(e))
            pass
        except (RuntimeError, TypeError, NameError) as e:
            print ("({})".format(e))
            pass

Based on the discussion below, made the changes but facing two problems.

 updatesql = " update table set [fieldname] = ? where dt = ?"  
 data = (outputData.reindex( ['fieldvalue'], currentDt,axis='columns').to_numpy())
 # EXECUTE QUERY AND BIND LIST OF TUPLES 
 updatecursor.executemany(updatesql, data.tolist()) 
 sql_conn.commit()

Problems a) Date is constant and not part of the OutputData dataframe. b) Float values are stored in scientific format. Prefer float values to be stored with precisions.

Before anything, stop using the modulo operator `%` for string formatting. This [method has been de-emphasized in Python but not officially deprecated *yet*](https://stackoverflow.com/a/13452357/1422451). Instead, use the preferred `str.format` (Python 2.6+) or the newer F-string (Python 3.6+). (And actually you should be using SQL parametefization anyway for this question). — Parfait, Nov 27 '20 at 21:37
Using string formatting to insert data *values* into an SQL statement is still a practice to be discouraged. Also, looping through the DataFrame row-by-row with `.execute()` is less efficient than `.executemany()` (or the SQLAlchemy equivalent in [my answer](https://stackoverflow.com/a/65044758/2144390)). — Gord Thompson, Nov 30 '20 at 19:45
@GordThompson, Please look into the latest statement. I no longer loop through it. But still looking for a way to format float values. — nsivakr, Nov 30 '20 at 19:52

Parfait · Accepted Answer · 2020-11-30T21:45:05.960

2

Consider executemany to avoid the for-loop by using a numpy array output via DataFrame.to_numpy(). Below uses SQL parameterization and not any string formatting.

With iterrows + cursor.execute (to demonstrate parameterization)

# PREPARED STATEMENT (NO DATA)
updatesql = "UPDATE SET [fieldname] = ?  WHERE dt = ?"

for index, row in outputData.iterrows():
    try:
        # EXECUTE QUERY AND BIND TUPLE OF PARAMS
        updatecursor.execute(updatesql, (fieldvalue, currentdt))
    except:
        ...

sql_conn.commit()

With to_numpy + cursor.executemany

# PREPARED STATEMENT (NO DATA)
updatesql = "UPDATE SET [fieldname] = ?  WHERE dt = ?"

# ROUND TO SCALE OR HOW MANY DECIMAL POINTS OF COLUMN TYPE
outputData['my_field_col'] = outputData['my_field_col'].round(4)

# ADD A NEW COLUMN TO DATA FRAME EQUAL TO CONSTANT VALUE   
outputData['currentDt'] = currentDt
                        
# SUBSET DATA BY NEEDED COLUMNS CONVERT TO NUMPY ARRAY
data = (outputData.reindex(['my_field_col', 'currentDt'], axis='columns').to_numpy())

# EXECUTE QUERY AND BIND LIST OF TUPLES
updatecursor.executemany(updatesql, data.tolist())
sql_conn.commit()

edited Nov 30 '20 at 21:45

answered Nov 27 '20 at 21:50

Parfait

104,375
17
94
125

Thanks. This sounds very promising. Will experiment and get back. – nsivakr Nov 28 '20 at 12:09
Great to hear. Did solution work? If not, what issues did you face? – Parfait Nov 30 '20 at 01:35
how to use in place substitution but format the float values? In other words, if I don't format float values, all kinds of scientific notation is updated in the database. – nsivakr Nov 30 '20 at 19:23
added my updated code but it doesn't work. Let me know, how to fix the same. – nsivakr Nov 30 '20 at 19:40
1

See edits using `round` to match decimal points of column type and assigning date constant as a new data frame column. – Parfait Nov 30 '20 at 21:46
Thanks. Will do testing and get back. – nsivakr Nov 30 '20 at 21:49

score 2 · Answer 2 · answered Nov 27 '20 at 23:04

Here's another way you could do it that would also take advantage of pyodbc's fast_executemany=True:

import sqlalchemy as sa

# …

print(outputData)  # DataFrame containing updates
"""console output:
   my_field_col my_date_col
0             0  1940-01-01
1             1  1941-01-01
2             2  1942-01-01
…
"""

engine = sa.create_engine(connection_uri, fast_executemany=True)

update_stmt = sa.text(
    f"UPDATE [{table_name}] SET [fieldname] = :my_field_col WHERE dt = :my_date_col"
)
update_data = outputData.to_dict(orient="records")
with engine.begin() as conn:
    conn.execute(update_stmt, update_data)

Update on SQL Server table from Python Pandas

2 Answers2