Adding two numeric pandas columns with different lengths based on condition

Question

I am writing a piece of simulation software in python using pandas, here is my problem:

Imagine you have two pandas dataframes dfA and dfB with numeric columns A and B respectively. Both dataframes have a different number of rows denoted by n and m. Let's assume that n > m. Moreover, dfA includes a binary column C, which has m times 1, and the rest 0. Assume both dfA and dfB are sorted.

My question is, in order, I want to add the values in B to the values in column A if column C == 0.

In the example n = 6, m = 3.

Example data:

dataA = {'A': [7,7,7,7,7,7],
         'C': [1,0,1,0,0,1]}
dfA = pd.Dataframe(dataA)
dfB = pd.Dataframe([3,5,4], columns = ['B'])

Example pseudocode: DOES NOT WORK

if dfA['C'] == 1:
    dfD['D'] = dfA['A']
else:
    dfD['D'] = dfA['A'] + dfB['B']

Expected result:

dfD['D']
[7,10,7,12,11,7]

I can only think of obscure for loops with index counters for each of the three vectors, but I am sure that there is a faster way by writing a function and using apply. But maybe there is something completely different that I am missing.

*NOTE: In the real problem the rows are not single values, but row vectors of equal length. Moreover, in the real problem it is not just simple addition but a weighted average over the two row vectors

score 0 · Accepted Answer · answered Jan 30 '23 at 12:09

0

You can use:

m = dfA['C'].eq(1)
dfA['C'] = dfA['A'].where(m, dfA['A']+dfB['B'].set_axis(dfA.index[~m]))

Or:

dfA.loc[m, 'C'] = dfA.loc[m, 'A']
dfA.loc[~m, 'C'] = dfB['B'].values

Output:

answered Jan 30 '23 at 12:09

mozway

194,879
13
39
75

1

Thank you very much, this is exactly what I was searching for. I am not too experienced with python, and this is very informative, so thanks again! – jorisvd Jan 30 '23 at 13:32

Ledian K. · Answer 2 · 2023-01-30T20:27:37.770

0

The alternative answer is pretty clever. I am just showing a different way if you would like to do it using loops:

# Create an empty df
dfD = pd.DataFrame() 

# Create Loop
k = 0
for i in range(len(dfA)):
  if dfA.loc[i, "C"] == 1:
    dfD.loc[i, "D"] = dfA.loc[i, "A"]
  else:
   dfD.loc[i, "D"] = dfA.loc[i, "A"] + dfB.loc[k, "B"]
   k = k+1

# Show results
dfD

edited Jan 30 '23 at 20:27

answered Jan 30 '23 at 13:55

Ledian K.

555
1
8
16

Adding two numeric pandas columns with different lengths based on condition

2 Answers2