1

I have a dataframe

import pandas as pd
data = {'A': ['SA01', '0007', 'SA06', '0198', 'SA06'], 
        'B': [2012, 2012, 2013, 2014, 2014], }
df = pd.DataFrame(data)

df = A     B
     SA01  2012
     0007  2012
     SA06  2013
     0198  2014
     SA06  2014

I want to use df.apply or other functions of pandas to add a df['C'] as follows:

df = A     B     C
     SA01  2012  M
     0007  2012  F
     SA06  2013  M
     0198  2014  F
     SA06  2014  M

If df['A'] contains substring 'SA' then df['C'] is 'M' else 'F'. How to solve?

Hari Krishnan
  • 2,049
  • 2
  • 18
  • 29
vincentlai
  • 379
  • 5
  • 18

1 Answers1

2

Use numpy.where with boolean mask created by contains or startswith:

df['new'] = np.where(df['A'].str.contains('SA'), 'M', 'F')
#alternative solution
#df['new'] = np.where(df['A'].str.startswith('SA'), 'M', 'F')
print (df)
      A     B new
0  SA01  2012   M
1  0007  2012   F
2  SA06  2013   M
3  0198  2014   F
4  SA06  2014   M
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252