1

I have a dataframe df like the following

df   A   B    C
0    1   0.7 0.3
1    0   0.2 0.8
2    0   0.8 0.2
3    1   0.6 0.4
4    1   0.9 0.1

I want to create a column D that has values (1-B) if A==1 or (1-C) if A==0. So

df   A   B    C    D
0    1   0.7 0.3  0.3
1    0   0.2 0.8  0.2
2    0   0.8 0.2  0.8
3    1   0.6 0.4  0.4
4    1   0.9 0.1  0.1
emax
  • 6,965
  • 19
  • 74
  • 141

2 Answers2

4

If sum by B and C columns get 1 is possible use numpy.where without subtracting:

df['D'] = np.where(df['A'] == 0, df['B'], df['C'])
print (df)
   A    B    C    D
0  1  0.7  0.3  0.3
1  0  0.2  0.8  0.2
2  0  0.8  0.2  0.8
3  1  0.6  0.4  0.4

If want use formula and A column contains only 1 and 0 values:

df['D'] = np.where(df['A'] == 0, 1 - df['C'], 1 - df['B'])
print (df)
   A    B    C    D
0  1  0.7  0.3  0.3
1  0  0.2  0.8  0.2
2  0  0.8  0.2  0.8
3  1  0.6  0.4  0.4
4  1  0.9  0.1  0.1

If possible multiple values in A column (most general solution) use numpy.select:

print (df)
   A    B    C
0  1  0.7  0.3
1  0  0.2  0.8
2  0  0.8  0.2
3  1  0.6  0.4
4  3  0.9  0.1 <- added 3

m1 = df['A'] == 0
m2 = df['A'] == 1
df['D'] = np.select([m1, m2], [1 - df['C'], 1 - df['B']], default=np.nan)
print (df)
   A    B    C    D
0  1  0.7  0.3  0.3
1  0  0.2  0.8  0.2
2  0  0.8  0.2  0.8
3  1  0.6  0.4  0.4
4  3  0.9  0.1  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

np.select() and np.where() are the way to go.

One more option, can also do

df.loc[df.A == 1, 'D'] = 1 - df.B
df.loc[df.A == 0, 'D'] = 1 - df.C
rafaelc
  • 57,686
  • 15
  • 58
  • 82