How to filter based on the some part of a value of rows in a pandas dataframe

Question

I have a pandas dataframe as below:

   +------+---+
   |Name  |age|
   +------+---+
   |Mona  |12 |
   +------+---+
   |Monika|25 |
   +------+---+
   |Tomas |3  |
   +------+---+
   |Ilyas |47 |
   +------+---+

Now, I want to assign some values in a new column to the Name column contains some substring. For example if it contains Mon then we assign text to them and if it contains as we assign city to them.

So the output must be:

   +------+---+----+
   |Name  |age|new |
   +------+---+----+
   |Mona  |12 |text|
   +------+---+----+
   |Monika|25 |text|
   +------+---+----+
   |Tomas |3  |city|
   +------+---+----|
   |Ilyas |47 |city|
   +------+---+----+

I wrote the following codes and it didn't work:

df['new'] = np.nan    
df['new'] = df['new'].mask( 'Mon' in df['Name'], 'text')    
df['new'] = df['new'].mask( 'as' in df['Name'], 'city')

And what if the word is `Monas`? Does priority go to the first condition or last? Your attempt seems to indicate the latter. — ALollz, Feb 20 '19 at 20:35

score 1 · Answer 1 · answered Feb 20 '19 at 20:36

If your logic is going to get complicated, you may want to encapsulate your mapping into a function:

def map_func(name):
    if 'Mon' in name:
        return 'text'
    elif 'as' in name:
        return 'city'

df['Name'].map(map_func)

Result:

0    text
1    text
2    city
3    city
Name: Name, dtype: object

score 0 · Answer 2 · answered Feb 20 '19 at 20:33

0

This should do:

df['new']=np.where(df.Name.str.contains('Mon'), 'text', 'city')

np.where works like Excel's IF : if the condition is met, fill with 'text', else fill with 'city'.

answered Feb 20 '19 at 20:33

Juan C

5,846
2
17
51

1

I think in this case, nested `where` or `select`, since it seems that `NaN` should be filled when neither `Mon` nor `as` are found (i.e. the column is initialized as `df['new'] = np.nan`) – ALollz Feb 20 '19 at 20:38
That's right. I'm not quite sure how to do it without starting nesting `np.where`s, though – Juan C Feb 20 '19 at 21:09

kofffii · Answer 3 · 2019-02-20T20:41:55.210

Df.loc[df.name.str.endswith('as'), 'new'] = 'city'
Df.loc[df.name.str.startswith('Mon'), 'new'] = 'text'

Str has a method startsWith and endsWith for that kind of problems. There's probably the way to do it in a single line but right now nothing comes to my mind.

Loc is always helpful with selecting and updating data on condition.

Edit: Juan C is right it should be 'contains' not 'startsWith', my bad.

score 0 · Answer 4 · answered Feb 20 '19 at 20:41

I think this would be your best bet, since it will handle case-insensitive matches and fill NaN where neither condition is met.

import numpy as np

     Name  Age
0    Mona   12
1  Monika   25
2   Tomas    3
3   Ilyas   47

default = np.where(df.Name.str.contains('as', case=False), 'city', np.nan)
cond = np.where(df.Name.str.contains('Mon', case=False), 'text', default)

df.assign(new=cond)

     Name  Age   new
0    Mona   12  text
1  Monika   25  text
2   Tomas    3  city
3   Ilyas   47  city

How to filter based on the some part of a value of rows in a pandas dataframe

4 Answers4