Finding matching rows in pandas columns and performing operation

Question

Suppose I have two dataframes, call them dfA and dfB, defined as follows:

AIDs = np.array(['y5','x5','x1','x3','y1','x4','y4','y3','y2','x2'])
AValues = np.random.randint(10, 100, size=10)

dfA = pd.DataFrame({'IDs': AIDs,
                   'Values': AValues})

BIDs = np.random.choice(AIDs, size=30)
BValues = np.random.randint(1000, 3000, size=30)
Differences = np.zeros(30)
dfB = pd.DataFrame({'IDs': BIDs,
                    'Values': BValues,
                    'Differences': Differences})

The goal is to find the corresponding ID from dfB in dfA, then subtract the value in dfA from the value in dfB at the corresponding ID location. Finally, all the values should be stored in the values column in dfB.

For instance if:

dfB = 
|index|IDs|Values|Differences|
|---|---|---|---|
|0|y2|48|0\.0|
|1|y4|47|0\.0|
|2|y1|22|0\.0|
|3|x2|86|0\.0|
|4|x1|67|0\.0|
|5|x2|65|0\.0|
|6|y2|97|0\.0|
|7|x2|89|0\.0|
|8|y3|51|0\.0|
|9|y1|74|0\.0|
|10|x3|99|0\.0|
|11|x5|51|0\.0|
|12|y1|82|0\.0|
|13|y3|73|0\.0|
|14|y1|56|0\.0|
|15|x3|99|0\.0|
|16|y5|44|0\.0|
|17|x4|57|0\.0|
|18|x4|24|0\.0|
|19|y1|28|0\.0|
|20|x5|79|0\.0|
|21|y3|75|0\.0|
|22|y1|34|0\.0|
|23|x2|15|0\.0|
|24|y2|59|0\.0|
|25|y5|33|0\.0|
|26|y5|36|0\.0|
|27|y5|13|0\.0|
|28|x4|85|0\.0|
|29|y5|35|0\.0|

dfA = 
|index|IDs|Values|
|---|---|---|
|0|y5|1622|
|1|x5|1294|
|2|x1|1722|
|3|x3|1108|
|4|y1|1499|
|5|x4|1123|
|6|y4|2866|
|7|y3|1075|
|8|y2|1767|
|9|x2|2748|

then dfB['Differences'] = 
|index|Differences|
|---|---|
|0|1767-48|
|1|2866-47|
|2|1499-22|
|3|2748-86|
.....
etc.

`dfB['difference'] = dfB['IDs'].map(dfA.set_index('IDs')['Values']).sub(dfB['Values'])` — mozway, Apr 23 '23 at 06:23

score 0 · Answer 1 · answered Apr 23 '23 at 06:19

There exists a "merge" in pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

For example you can do

dfC = pd.merge(dfA, dfB, on="ID")

Then you need to calculate the differences. for this yoz can use a function like this

 dfC['Differences'] = dfC.apply(lambda row: ...)

Finding matching rows in pandas columns and performing operation

1 Answers1