How to perform reconciliation using DataFrame?

Question

Let's say I have a DataFrame that I get weekly and would like to update a tracker that is updated based on a weekly report, for example:

This is a weekly report I receive:

ID   Cost
X12  500
X54  100
X52  150
X45  200
X32  435

I have a DataFrame for the main metric tracker, that needs to be updated based on the weekly report:

I would like to take the numbers from DataFrame 1 and put them into DataFrame 2, which would result in:

How would I go about performing a function like this, from one DataFrame onto another. I assume we use a for loop and search through the second DataFrame while iterating through the first, but how would I set this up?

Thanks!

score 1 · Answer 1 · answered Aug 23 '17 at 18:16

1

If order isn't an issue, then

In [1178]: dfw.set_index('ID').combine_first(dfo.set_index('ID')).reset_index()
Out[1178]:
    ID    Cost
0  X12   500.0
1  X32   435.0
2  X37  4664.0
3  X45   200.0
4  X48   547.0
5  X52   150.0
6  X52   150.0
7  X54   100.0
8  X57   467.0
9  X76    34.0

This will take care of any new ID additions as well.

answered Aug 23 '17 at 18:16

Zero

74,117
18
147
154

Would I be able to do this whilst keeping the same order I provided? – sgerbhctim Aug 23 '17 at 18:20
Also, would this skip over NaN values in the ID? – sgerbhctim Aug 23 '17 at 18:26

score 1 · Answer 2 · answered Aug 23 '17 at 18:39

1

You can use map and combine_first

df2['Cost'] = df2['ID'].map(df1.set_index('ID')['Cost']).combine_first(df2['Cost'])

You get

    ID  Cost
0   X12 500.0
1   X54 100.0
2   X52 150.0
3   X45 200.0
4   X37 4664.0
5   X76 34.0
6   X57 467.0
7   X52 150.0
8   X48 547.0
9   X32 435.0

answered Aug 23 '17 at 18:39

Vaishali

37,545
5
58
86

Let's say I have an ID with a NaN value. Would it skip over it? – sgerbhctim Aug 23 '17 at 18:41
1

ID with nan in df1 or df2? Say we test the code by introducing nan in df2, df2.loc[df2.ID == 'X12', 'Cost'] = np.nan. The code I have suggested works. – Vaishali Aug 23 '17 at 18:47

score 0 · Answer 3 · answered Aug 23 '17 at 18:16

0

You can do like this.

where 
df -> first dataframe
df1 -> second dataframe

Code:

df.loc[df.ID.isin(df1.ID), ['Cost']] = df1[['Cost']]
print df

Reference:

Replace column values based on another dataframe

answered Aug 23 '17 at 18:16

Naren Murali

19,250
3
27
54

score 0 · Answer 4 · answered Aug 23 '17 at 19:24

Using drop_duplicates

rep.append(track).drop_duplicates('ID')

    ID  Cost
0  X12   500
1  X54   100
2  X52   150
3  X45   200
4  X32   435
4  X37  4664
5  X76    34
6  X57   467
8  X48   547

If you want to reset your index

rep.append(track).drop_duplicates('ID').reset_index(drop=True)

    ID  Cost
0  X12   500
1  X54   100
2  X52   150
3  X45   200
4  X32   435
5  X37  4664
6  X76    34
7  X57   467
8  X48   547

How to perform reconciliation using DataFrame?

4 Answers4