2

I have a dataframe and i want to add a columns using if elif condition on the rows of the table. I am using if elif statement but that is not working. Can we not use the conditional statememt for a data frame?

Here is my code:

import pandas as pd
df = pd.DataFrame({'c1': ['a', 'a', 'q', 'a'],
              'c2': ['b', 'e', 'b', 'f'],
               'c3': ['c', 'f', 'c', 'd']})

if [(df['c1']=='a') & (df['c2']=='b')]:
    df['q']= df['c1'] + '+' + df['c2']
elif (df['c1']=='a' & df['c2']=='e'):
    df['q'] = df['c1'] + '*' + df['c2']
else:
    df['q'] = df['c1'] + '-' + df['c2']

The new column 'q' has contents: 'a+b', 'a+e', 'q+b', 'a+f'

While i want it as: 'a+b', 'a*e', 'q-b', 'a-f'

sky_bird
  • 247
  • 1
  • 4
  • 13

2 Answers2

2

Use numpy.select what is better readability form of multiple nested np.where:

m1 = (df['c1']=='a') & (df['c2']=='b')
m2 = (df['c1']=='a') & (df['c2']=='e')

a1 = df['c1'] + '+' + df['c2']
a2 = df['c1'] + '*' + df['c2']
a3 = df['c1'] + '-' + df['c2']

df['q'] = np.select([m1, m2], [a1, a2], default=a3)
print (df)

  c1 c2 c3    q
0  a  b  c  a+b
1  a  e  f  a*e
2  q  b  c  q-b
3  a  f  d  a-f
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Vectorized version of if statements is np.where. I assigned the conditions and the possible outcomes to variables to improve readability because with nested np.wheres it may become hard to follow.

cond1 = (df['c1']=='a') & (df['c2']=='b')
cond2 = (df['c1']=='a') & (df['c2']=='e')

case1 = df['c1'] + '+' + df['c2']
case2 = df['c1'] + '*' + df['c2']
case3 = df['c1'] + '-' + df['c2']

df['q'] = np.where(cond1, case1, np.where(cond2, case2, case3))

df
Out: 
  c1 c2 c3    q
0  a  b  c  a+b
1  a  e  f  a*e
2  q  b  c  q-b
3  a  f  d  a-f
ayhan
  • 70,170
  • 20
  • 182
  • 203