Check exact match of string with brackets in another string Python

Question

There are many answers to a similar type of question to mine, but I'm not sure why this isn't working.

I've got a very simple example of two strings, where I am checking if one is contained in another (with an exact match).

For example, suppose I've got the following:

import re

text = "random/path/" 
search = "test/random/path/path_with_brackets[3]/another_path"

if re.search(r"\b{}\b".format(text), search, re.IGNORECASE) is not None:
    print("text is contained in search")
else:
    print("text not contained in search")

As expected, the above code returns:

text is contained in search

since an exact match of "random/path/" is found in "test/random/path/path_with_brackets[3]/another_path"

However, if I add an extra path (that contains brackets) to the text, such as:

import re

text = "random/path/path_with_brackets[3]" 
search = "test/random/path/path_with_brackets[3]/another_path"

if re.search(r"\b{}\b".format(text), search, re.IGNORECASE) is not None:
    print("text is contained in search")
else:
    print("text not contained in search")

the text is not found in search, even though it exists. The result is:

text not contained in search

What am I doing wrong in the second example? Does the fact that "text" have brackets change anything?

Yes, brackets are the reason. Brackets are special characters in regular expressions. Do you even need regular expressions? It seems like plain substring matching would be enough. `if text in search:` — John Gordon, Mar 31 '20 at 14:57
You sir are indeed correct. This was so simple and I don't know why I complicated it all. Your suggestion worked perfectly. — Adam, Mar 31 '20 at 14:59

score 1 · Answer 1 · answered Mar 31 '20 at 15:21

1

Try to use "replace" to validate brackets and use * to match 0 or more repetitions.

import re

text = "random/path/path_with_brackets[3]"
text = text.replace('[','\[')
search = "test/random/path/path_with_brackets[3]/another_path"

if re.search(r".*{}.*".format(text), search, re.IGNORECASE) is not None:
    print("text is contained in search")
else:
    print("text not contained in search")

answered Mar 31 '20 at 15:21

m8factorial

188
6

Use `re.escape` for a more thorough replacement. I posted an answer including it. – wjandrea Mar 31 '20 at 16:11

wjandrea · Answer 2 · 2020-03-31T15:28:29.183

If you don't need to use regex, you can simply use in:

print(text in search)  # -> True

If you do need to use regex, like if the word boundaries are important, i.e. you don't want random to match inside get_random for example, then you'll need to escape the brackets since they're special; they represent a character set. E.g. [3] matches 3. You can do that with re.escape:

r"\b{}\b".format(re.escape(text))

But then you have another problem: ]/ isn't a word boundary, so \b won't match there. To fix it you can use a similar concept to \b:

r"(?:^|\W)({})(?:$|\W)".format(...)

These are non-capturing groups that match either the start/end of the string or a non-word character.

It also makes sense to put your desired text in a group so that you can retrieve it with .group(1).

Check exact match of string with brackets in another string Python

2 Answers2

Linked