0

I have a requirement where i have some input data in one df which needs to be compare the 'name' column with other df which has same column, and if the match is found i need to do update those to my table.

Issue: The last row of my data_to_update is applying to all the columns.

What i have tried so far:

import pandas as pd
import numpy as np
import cx_Oracle

conn = cx_Oracle.connect('xxxxx', 'yyyyyy',dsn_tns)
cursor = conn.cursor()

data = [{'name': 'ABC', 'col1': 10, 'col2': 20, 'col3': 'John'},
        {'name': 'DEF', 'col1': 30, 'col2': 40, 'col3': 'Peter'},
        {'name': 'PQR', 'col1': 50, 'col2': 60, 'col3': 'Mary'},
        {'name': 'XYZ', 'col1': 70, 'col2': 80, 'col3': 'Robert'}]
df = pd.DataFrame(data)

data2 = [{'name': 'ABC', 'col1': 10, 'col2': 20000, 'col3': 'XXXX'},
        {'name': 'DEF', 'col1': 30, 'col2': 40, 'col3': 'Peter'},
        {'name': 'PQR', 'col1': 50, 'col2': 60, 'col3': 'Mary'},
        {'name': 'XYZ', 'col1': 70, 'col2': 80000, 'col3': 'YYYY'}]
df2 = pd.DataFrame(data)
    
df['match'] = np.where(df['name'].isin(df2['name']), 1, 0)

exist_df = df[df['match'] == 1]
del exist_df['match']

new_df = df[df['match'] == 0]
del new_df['match']

update_list = exist_df['name'].tolist()

to_update =  "','".join(update_list)
to_update1 = "('" + to_update + "')"

data_to_update = [tuple(x) for x in exist_df[['col2','col3']].values]

update_query = ''' update mytable set col2 =: col2, col3 =: col3 where name in ''' + to_update1

cursor.executemany(update_query,data_to_update)
conn.commit()

My table data before is:

name  col1  col2  col3
ABC    10    20   John
DEF    30    40   Peter
PQR    50    60   Mary
XYZ    70    80   Robert

Data after running above code is:

name  col1  col2     col3
XYZ    70    80000   YYYY
XYZ    70    80000   YYYY
XYZ    70    80000   YYYY
XYZ    70    80000   YYYY

But the expected table data after the process:

name  col1  col2    col3
ABC    10   20000   XXXX
DEF    30    40     Peter
PQR    50    60     Mary
XYZ    70   80000   YYYY

Any help is highly appreciated, thanks in advance!

Mr.B
  • 51
  • 6

2 Answers2

2

You need to execute a separate statement for each name. Change your tuple so it also includes the name column, then you can match that with a placeholder. executemany will then update each row with its corresponding values.

data_to_update = [tuple(x) for x in exist_df[['col2','col3', 'name']].values]
sql = 'UPDATE table SET col2 = :col2, col3 = :col3 WHERE name = :name'
cursor.executemany(sql, data_to_update)
Christopher Jones
  • 9,449
  • 3
  • 24
  • 48
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • I have a question please, what if the name is just a string var and holds values like name = ('ABC', 'XYZ', 'LMN') and need to use it in the WHERE IN clause – Mr.B Apr 27 '21 at 19:49
  • 1
    See https://cx-oracle.readthedocs.io/en/latest/user_guide/bind.html#binding-multiple-values-to-a-sql-where-in-clause – Barmar Apr 27 '21 at 19:51
  • 1
    Adding a link for reference: In some data manipulation cases MERGE might be useful, for example see https://stackoverflow.com/questions/67161376/is-there-a-way-to-improve-a-merge-query Other than whether it does what you want with the data, then you have to assess whether it's more efficient to do the processing in Python or the DB, and what the cost of transferring data from Python to the DB is. – Christopher Jones Apr 27 '21 at 22:37
0

The =: syntax must be something special with the cx_Oracle connector. Remember that update_list contains every name in your dataframe, so executing a single UPDATE with WHERE NAME IN update_list is definitely going to set all the rows to a single value. You're going to need 4 separate UPDATE statements to update 4 rows to different values.

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30