Create a new dataframe by filtering characters from an existing dataframe

Question

I have a pandas dataframe:

id         name

63         T台
64        4S店
66    江南style
68        1号店
69         小S
70         大S
72          一
73         一一
74        一一二
77       一一列举
79       一一对应
80        一一记
81       一一道来
82         一丁
84        一丁点

I'm trying to create a new dataframe only with the rows that don't have characters from a certain list in the column name:

letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '%', '+']

I found several questions somewhat similar (like this), but they are filtering based on specific values (e.g. df[(df['count'] == '2') & (df['price'] == '100')]), and not from a list of values.

The output should be a new dataframe without rows 63-70 in this example.

I tried to do something similar to get a list of True/False that I can use on the dataframe to filter:

('a' not in current_dataframe['name'])

But this only outputs one value for some reason:

>>> True

score 2 · Accepted Answer · answered Jul 08 '21 at 17:47

You can use regular expression:

import re

pat = re.compile("|".join(re.escape(l) for l in letters), flags=re.I)
print(df[~df["name"].str.contains(pat)])

Prints:

    id  name
3   68   1号店
6   72     一
7   73    一一
8   74   一一二
9   77  一一列举
10  79  一一对应
11  80   一一记
12  81  一一道来
13  82    一丁
14  84   一丁点

score 1 · Answer 2 · answered Jul 08 '21 at 17:51

With a list comprehension:

to_keep = [not any(letter in val for letter in letters) for val in df.name]
new_df = df[to_keep]

where to_keep is a boolean list and entries are True if not any letter in letters is in the corresponding value of df.name. Then we use boolean indexing to remain only those rows,

to get

>>> new_df

3   68   1号店
6   72     一
7   73    一一
8   74   一一二
9   77  一一列举
10  79  一一对应
11  80   一一记
12  81  一一道来
13  82    一丁
14  84   一丁点

Create a new dataframe by filtering characters from an existing dataframe

2 Answers2