Keep strings present in a list from a column in pandas

Question

I have a problem similar to this question but an opposite challenge. Instead of having a removal list, I have a keep list - a list of strings I'd like to keep. My question is how to use a keep list to filter out the unwanted strings and retain the wanted ones in the column.

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Mitty, Kitty",
            "Kandy, Puppy",
            "Judy, Micky, Loudy",
            "Cindy, Judy",
            "Kitty, Wicky",
        ],
    }
)

   ID                name
0   1        Mitty, Kitty
1   2        Kandy, Puppy
2   3  Judy, Micky, Loudy
3   4         Cindy, Judy
4   5        Kitty, Wicky

To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]

score 2 · Accepted Answer · answered Dec 02 '21 at 11:38

Use Series.str.findall with Series.str.join:

To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]

df['name'] = df['name'].str.findall('|'.join(To_keep_lst)).str.join(', ')
print (df)
   ID          name
0   1         Kitty
1   2         Kandy
2   3  Micky, Loudy
3   4              
4   5  Kitty, Wicky

score 1 · Answer 2 · answered Dec 02 '21 at 12:01

Use a comprehension to filter out names you want to keep:

keep_names = lambda x: ', '.join([n for n in x.split(', ') if n in To_keep_lst])
df['name'] = df['name'].apply(keep_names)
print(df)

# Output:
   ID          name
0   1         Kitty
1   2         Kandy
2   3  Micky, Loudy
3   4              
4   5  Kitty, Wicky

Note: the answer of @jezrael is much faster than mine.

Keep strings present in a list from a column in pandas

2 Answers2