13

I'm attempting to select rows from a dataframe using the pandas str.contains() function with a regular expression that contains a variable as shown below.

df = pd.DataFrame(["A test Case","Another Testing Case"], columns=list("A"))
variable = "test"
df[df["A"].str.contains(r'\b' + variable + '\b', regex=True, case=False)] #Returns nothing

While the above returns nothing, the following returns the appropriate row as expected

df[df["A"].str.contains(r'\btest\b', regex=True, case=False)] #Returns values as expected

Any help would be appreciated.

neanderslob
  • 2,633
  • 6
  • 40
  • 82
  • 1
    Perhaps your issue is that you are concatenating the raw strings to a standard string?? Maybe try `fr'\b{variable}\b'` – nicholishen Dec 04 '18 at 22:05

3 Answers3

21

Both word boundary characters must be inside raw strings. Why not use some sort of string formatting instead? String concatenation as a rule is generally discouraged.

df[df["A"].str.contains(fr'\b{variable}\b', regex=True, case=False)] 
# Or, 
# df[df["A"].str.contains(r'\b{}\b'.format(variable), regex=True, case=False)] 

             A
0  A test Case
cs95
  • 379,657
  • 97
  • 704
  • 746
  • How would you do this if you had the specify the amount of characters, since that happens with `[0-9]{3}`, for example if you want a pattern of three numbers. Was facing this problem just yet, so just used string concatenation which solved it, and f-string didnt work. – Erfan May 11 '19 at 11:50
  • @Erfan the standard method is to escape the curly braces. If memory serves, that would be {{3}}. – cs95 May 11 '19 at 15:16
0

Following command work for me:
df.query('text.str.contains(@variable)')

-2

I had the exact same problem when parsing a 'variable' to str.contains(variable).

Try using str.contains(variable, regex=False)

It worked for me perfectly.