0

There are many answers to a similar type of question to mine, but I'm not sure why this isn't working.

I've got a very simple example of two strings, where I am checking if one is contained in another (with an exact match).

For example, suppose I've got the following:

import re

text = "random/path/" 
search = "test/random/path/path_with_brackets[3]/another_path"

if re.search(r"\b{}\b".format(text), search, re.IGNORECASE) is not None:
    print("text is contained in search")
else:
    print("text not contained in search")

As expected, the above code returns:

text is contained in search

since an exact match of "random/path/" is found in "test/random/path/path_with_brackets[3]/another_path"

However, if I add an extra path (that contains brackets) to the text, such as:

import re

text = "random/path/path_with_brackets[3]" 
search = "test/random/path/path_with_brackets[3]/another_path"

if re.search(r"\b{}\b".format(text), search, re.IGNORECASE) is not None:
    print("text is contained in search")
else:
    print("text not contained in search")

the text is not found in search, even though it exists. The result is:

text not contained in search

What am I doing wrong in the second example? Does the fact that "text" have brackets change anything?

Adam
  • 2,384
  • 7
  • 29
  • 66
  • 1
    Yes, brackets are the reason. Brackets are special characters in regular expressions. Do you even need regular expressions? It seems like plain substring matching would be enough. `if text in search:` – John Gordon Mar 31 '20 at 14:57
  • You sir are indeed correct. This was so simple and I don't know why I complicated it all. Your suggestion worked perfectly. – Adam Mar 31 '20 at 14:59

2 Answers2

1

Try to use "replace" to validate brackets and use * to match 0 or more repetitions.

import re

text = "random/path/path_with_brackets[3]"
text = text.replace('[','\[')
search = "test/random/path/path_with_brackets[3]/another_path"

if re.search(r".*{}.*".format(text), search, re.IGNORECASE) is not None:
    print("text is contained in search")
else:
    print("text not contained in search")
m8factorial
  • 188
  • 6
1

If you don't need to use regex, you can simply use in:

print(text in search)  # -> True

If you do need to use regex, like if the word boundaries are important, i.e. you don't want random to match inside get_random for example, then you'll need to escape the brackets since they're special; they represent a character set. E.g. [3] matches 3. You can do that with re.escape:

r"\b{}\b".format(re.escape(text))

But then you have another problem: ]/ isn't a word boundary, so \b won't match there. To fix it you can use a similar concept to \b:

r"(?:^|\W)({})(?:$|\W)".format(...)

These are non-capturing groups that match either the start/end of the string or a non-word character.

It also makes sense to put your desired text in a group so that you can retrieve it with .group(1).

wjandrea
  • 28,235
  • 9
  • 60
  • 81