1

I have a problem similar to this question but an opposite challenge. Instead of having a removal list, I have a keep list - a list of strings I'd like to keep. My question is how to use a keep list to filter out the unwanted strings and retain the wanted ones in the column.

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Mitty, Kitty",
            "Kandy, Puppy",
            "Judy, Micky, Loudy",
            "Cindy, Judy",
            "Kitty, Wicky",
        ],
    }
)

   ID                name
0   1        Mitty, Kitty
1   2        Kandy, Puppy
2   3  Judy, Micky, Loudy
3   4         Cindy, Judy
4   5        Kitty, Wicky

To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]
codedancer
  • 1,504
  • 9
  • 20

2 Answers2

2

Use Series.str.findall with Series.str.join:

To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]

df['name'] = df['name'].str.findall('|'.join(To_keep_lst)).str.join(', ')
print (df)
   ID          name
0   1         Kitty
1   2         Kandy
2   3  Micky, Loudy
3   4              
4   5  Kitty, Wicky
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Use a comprehension to filter out names you want to keep:

keep_names = lambda x: ', '.join([n for n in x.split(', ') if n in To_keep_lst])
df['name'] = df['name'].apply(keep_names)
print(df)

# Output:
   ID          name
0   1         Kitty
1   2         Kandy
2   3  Micky, Loudy
3   4              
4   5  Kitty, Wicky

Note: the answer of @jezrael is much faster than mine.

Corralien
  • 109,409
  • 8
  • 28
  • 52