2

I'm trying to create a new column in a pandas dataframe based on whether a string is contained in another column. I'm using np.select based on this post. Here is an example dataframe and an example function to create the new column

df=pd.DataFrame({'column':['one','ones','other','two','twos','others','three','threes']})

def add(df):

    conditions = [
        ('one' in df['column']),
        ('two' in df['column']),
        ('three' in df['column']),
        ('other' in df['column'])] 

    choices = [1, 2, 3, 0]
    df['Int'] = np.select(conditions, choices, default=0)

    return df

new_df=add(df)

The output I'm getting is

   column  Int
0     one    0
1    ones    0
2   other    0
3     two    0
4    twos    0
5  others    0
6   three    0
7  threes    0

And what I want is

   column  Int
0     one    1
1    ones    1
2   other    0
3     two    2
4    twos    2
5  others    0
6   three    3
7  threes    3

what am I doing wrong?

Novice
  • 855
  • 8
  • 17

1 Answers1

2

If need test substrings use Series.str.contains:

 conditions = [
        (df['column'].str.contains('one')),
        (df['column'].str.contains('two')),
        (df['column'].str.contains('three')),
        (df['column'].str.contains('other'))] 

If need exact match use Series.eq or ==:

 conditions = [
        (df['column'].eq('one')),
        (df['column'].eq('two')),
        (df['column'].eq('three')),
        (df['column'].eq('other'))] 

 conditions = [
        (df['column'] == 'one'),
        (df['column'] == 'two'),
        (df['column'] == 'three'),
        (df['column'] == 'other')] 

print (new_df)
   column  Int
0     one    1
1    ones    1
2   other    0
3     two    2
4    twos    2
5  others    0
6   three    3
7  threes    3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I am facing the similar issue, can you take a look here, please? https://stackoverflow.com/questions/63642173/pandas-apply-merge-operations-from-a-column – Aaditya Ura Sep 03 '20 at 20:21