-1

My pandas dataframe:

ID String Pet
1 this is a cat
2 hello dog

I would like to extract the pet from the 'String' column and fill the 'Pet' column accordingly. The third row should be empty, and not filled by default.

My attempt:

df['Pet'] = np.where(df['String'].str.contains("cat"), "cat",
            np.where(df['String'].str.contains("dog"), "dog", '0'))

Unfortunately the empty (third) row also gets filled in my attempt.

Thank you in advance for your help!

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
Bloom
  • 59
  • 4

2 Answers2

1

One approach is to, first, create a list with the strings that one see as pet, such as

pets = ['cat', 'dog', 'bird']

Then, using Python pandas.Series.str.extract and regular expressions (using re) one is able to do the work

import re
    
df['Pet'] = df['String'].str.extract(f'({"|".join(pets)})', flags=re.IGNORECASE, expand=False)

[Out]:

   ID         String  Pet
0   1  this is a cat  cat
1   2      hello dog  dog

Note:

  • flags=re.IGNORECASE makes this approach case insensitive.
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
1

Looks like you could use a regex with str.extract and fillna for your default value:

animals = ['cat', 'dog']
regex = '|'.join(animals)

df['Pet'] = df['String'].str.extract(f'(?i)({regex})', expand=False).fillna(0)

output:

   ID           String  Pet
0    1  this is a cat   cat
1    2      hello dog   dog
mozway
  • 194,879
  • 13
  • 39
  • 75