0

I need to compare new datasets to existing sql datasets and update them if new information is presenting itself.

data from db:
dfa 
id   foo    bar       
1     2     "home"
2     5     "work"
3     6     "car"
4     99    "people"

new data:
dfb
id   foo    bar 
1     22    "home"
2     5     "work"
8     8     "pet"
4     99    "humans"

What I need is a way to recognize that for id 1, there is a different value in column foo and that for id 4 there is a new value for column bar. And then update the dataframe from the db before sending it back to the db. I'd like to do this in a runtime efficient maner.

dfout
id   foo    bar       
1     22     "home"
2     5     "work"
3     6     "car"
4     99    "humans"

I have searched the web for a solution. But I can't find my specific case and I have trouble fitting what I do find into my case. Can someone explain how I would do this?

These seem related but deal with non overlapping data and entire new rows. Pandas sort columns and find difference Python Pandas - Find difference between two data frames

SK4ndal
  • 39
  • 6

1 Answers1

2

Use DataFrame.update by Id, so first are converted columns to index in both DataFrames:

df1 = dfa.set_index('id')
df2 = dfb.set_index('id')

df1.update(df2)
dfa = df1.reset_index().astype(dfa.dtypes)
print (dfa)
   id  foo     bar
0   1   22    home
1   2    5    work
2   3    6     car
3   4   99  humans
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252