Do these characters have some sort of mapping function? "[1]", "[2]", "[3]",...,"[n]"

Question

I am using this line of code

df_mask = ~df[new_col_titles[:1]].apply(lambda x : x.str.contains('|'.join(filter_list), flags=re.IGNORECASE)).any(1)

to create a mask for my df. The filter list is

filter_list = ["[1]", "[2]", "[3]", "[4]", "[5]", "[6]", "[7]", "[8]","[9]",..."[n]"]

But I am having weird results I was hoping it would just filter the rows in column 0 of the df that have [1]...[n] in. But it doesn't it is also filtering rows that don't have those elements in. There is somewhat a pattern to it though. It will filter out rows that have numbers with "characters" by which i mean £55, 2010), 55*, 55 *

Can anyone explaine what is going on and if there is a workaround for this?

it's tough to visualize what's going on. Can you provide sample input and expected output? https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Sep 01 '20 at 22:48
`[]` has special meaning in regular expressions. You need to escape it if you want to match it literally. — Barmar, Sep 01 '20 at 22:53
`[1]` matches the digit `1`, it doesn't match the square brackets. — Barmar, Sep 01 '20 at 22:54

score 1 · Accepted Answer · answered Sep 01 '20 at 22:56

If you want to match the items in filter list exactly, use re.escape() to escape the special characters. [1] is a regular expression that just matches the digit 1, not the string [1].

df_mask = ~df[new_col_titles[:1]].apply(lambda x : x.str.contains('|'.join(re.escape(f) for f in filter_list), flags=re.IGNORECASE)).any(1)

See Reference - What does this regex mean?

Do these characters have some sort of mapping function? "[1]", "[2]", "[3]",...,"[n]"

1 Answers1