0

Suppose I have two dataframes, call them dfA and dfB, defined as follows:

AIDs = np.array(['y5','x5','x1','x3','y1','x4','y4','y3','y2','x2'])
AValues = np.random.randint(10, 100, size=10)

dfA = pd.DataFrame({'IDs': AIDs,
                   'Values': AValues})

BIDs = np.random.choice(AIDs, size=30)
BValues = np.random.randint(1000, 3000, size=30)
Differences = np.zeros(30)
dfB = pd.DataFrame({'IDs': BIDs,
                    'Values': BValues,
                    'Differences': Differences})

The goal is to find the corresponding ID from dfB in dfA, then subtract the value in dfA from the value in dfB at the corresponding ID location. Finally, all the values should be stored in the values column in dfB.

For instance if:

dfB = 
|index|IDs|Values|Differences|
|---|---|---|---|
|0|y2|48|0\.0|
|1|y4|47|0\.0|
|2|y1|22|0\.0|
|3|x2|86|0\.0|
|4|x1|67|0\.0|
|5|x2|65|0\.0|
|6|y2|97|0\.0|
|7|x2|89|0\.0|
|8|y3|51|0\.0|
|9|y1|74|0\.0|
|10|x3|99|0\.0|
|11|x5|51|0\.0|
|12|y1|82|0\.0|
|13|y3|73|0\.0|
|14|y1|56|0\.0|
|15|x3|99|0\.0|
|16|y5|44|0\.0|
|17|x4|57|0\.0|
|18|x4|24|0\.0|
|19|y1|28|0\.0|
|20|x5|79|0\.0|
|21|y3|75|0\.0|
|22|y1|34|0\.0|
|23|x2|15|0\.0|
|24|y2|59|0\.0|
|25|y5|33|0\.0|
|26|y5|36|0\.0|
|27|y5|13|0\.0|
|28|x4|85|0\.0|
|29|y5|35|0\.0|

dfA = 
|index|IDs|Values|
|---|---|---|
|0|y5|1622|
|1|x5|1294|
|2|x1|1722|
|3|x3|1108|
|4|y1|1499|
|5|x4|1123|
|6|y4|2866|
|7|y3|1075|
|8|y2|1767|
|9|x2|2748|

then dfB['Differences'] = 
|index|Differences|
|---|---|
|0|1767-48|
|1|2866-47|
|2|1499-22|
|3|2748-86|
.....
etc. 
PurpleSky
  • 111
  • 5

1 Answers1

0

There exists a "merge" in pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

For example you can do

dfC = pd.merge(dfA, dfB, on="ID")

Then you need to calculate the differences. for this yoz can use a function like this

 dfC['Differences'] = dfC.apply(lambda row: ...)
René Winkler
  • 6,508
  • 7
  • 42
  • 69