362

I'd like to use a variable inside a regex, how can I do this in Python?

TEXTO = sys.argv[1]

if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed
tripleee
  • 175,061
  • 34
  • 275
  • 318
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268

11 Answers11

346

You have to build the regex as a string:

TEXTO = sys.argv[1]
my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"

if re.search(my_regex, subject, re.IGNORECASE):
    etc.

Note the use of re.escape so that if your text has special characters, they won't be interpreted as such.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • 4
    What if your variable goes first? `r'' + foo + 'bar'` ? – deed02392 Dec 06 '13 at 17:24
  • @deed02392 `r''` not necessary if you do `re.escape(foo)`, which you should anyway. Actually, I think `re` interprets whatever it's given as a unicode string regardless of whether you prefix `r` or not. – OJFord Aug 13 '14 at 10:23
  • 1
    Does .format() work as well in place of the re.escape or is re.escape() necessary? – Praxiteles Feb 03 '16 at 09:59
  • @praxiteles did u find the answer? – Pedro Lobito Dec 09 '17 at 02:43
  • 2
    I'm not sure if this works in I need to have a group of which the variable is a part of. Other answers below look more intuitive for that, and don't break the regex into several expressions. – gdvalderrama Dec 14 '17 at 09:15
  • The r'' just tells Python to treat its content as 'raw', else you'd need to escape backslashes etc. See https://docs.python.org/3/howto/regex.html#the-backslash-plague It's not needed to use any r'' for building a regex, though. – MKesper May 07 '19 at 12:44
  • r means a raw string. regular expressions use raw strings – Golden Lion Oct 11 '21 at 21:18
216

From python 3.6 on you can also use Literal String Interpolation, "f-strings". In your particular case the solution would be:

if re.search(rf"\b(?=\w){TEXTO}\b(?!\w)", subject, re.IGNORECASE):
    ...do something

EDIT:

Since there have been some questions in the comment on how to deal with special characters I'd like to extend my answer:

raw strings ('r'):

One of the main concepts you have to understand when dealing with special characters in regular expressions is to distinguish between string literals and the regular expression itself. It is very well explained here:

In short:

Let's say instead of finding a word boundary \b after TEXTO you want to match the string \boundary. The you have to write:

TEXTO = "Var"
subject = r"Var\boundary"

if re.search(rf"\b(?=\w){TEXTO}\\boundary(?!\w)", subject, re.IGNORECASE):
    print("match")

This only works because we are using a raw-string (the regex is preceded by 'r'), otherwise we must write "\\\\boundary" in the regex (four backslashes). Additionally, without '\r', \b' would not converted to a word boundary anymore but to a backspace!

re.escape:

Basically puts a backslash in front of any special character. Hence, if you expect a special character in TEXTO, you need to write:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

NOTE: For any version >= python 3.7: !, ", %, ', ,, /, :, ;, <, =, >, @, and ` are not escaped. Only special characters with meaning in a regex are still escaped. _ is not escaped since Python 3.3.(s. here)

Curly braces:

If you want to use quantifiers within the regular expression using f-strings, you have to use double curly braces. Let's say you want to match TEXTO followed by exactly 2 digits:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\d{{2}}\b(?!\w)", subject, re.IGNORECASE):
    print("match")
airborne
  • 3,664
  • 4
  • 15
  • 27
  • 10
    As of 2020, this is the simplest and most pythonic way to use a variable inside a regular expression – Pedro Lobito Jan 17 '20 at 05:35
  • 7
    This is definitely a **WOW**. – Jia Gao Jan 25 '20 at 18:08
  • 3
    can someone explain the significance of "rf" here – Harsha Reddy Feb 25 '20 at 10:34
  • 3
    @HarshaReddy: 'r': This string is a raw string: If you don't use it, '\b' will be converted to the backspace character (https://docs.python.org/3/howto/regex.html#more-pattern-power). 'f' tells python that this is an 'f-string', s. link above, and enables you to write the variable into the curly braces- – airborne Feb 25 '20 at 11:36
  • 1
    Does this escape special characters in the substituted string? Just from a little testing, it doesn't seem to. `re.escape()` is still required – Tugzrida Mar 11 '20 at 10:02
  • @Tugzrida: yes, of course if you expect user input containing special characters you need to write "if re.search(rf"\b(?=\w){re.escape(TEXTO)}\b(?!\w)", subject, re.IGNORECASE)". The 'r' only tells python that the literal string is raw. "re.escape" basically puts a backslash in front of every special character. – airborne Mar 11 '20 at 14:54
  • 6
    How to write quantifiers in f-strings: `fr"foo{{1,5}}"` (double the braces) – PunchyRascal Mar 19 '20 at 18:28
  • For python >=3.6 the chars you wrote are not escaped.I would say they **do not have to be escaped**. If you want to have a dot instead, you still (also in new Python versions) have to escape it with `\.` – Timo Dec 01 '20 at 17:53
  • I got : `splitter = 1;str_='unbenannt.png28.png';mat =re.match(f'unbenannt\.png\d{{splitter}}\.png$', str_)` that does not grasp the string. This works: ` mat = re.match('unbenannt\.png\d{'+str(splitter)+'}\.png$', str_)` – Timo Dec 01 '20 at 18:20
  • In quantifiers one can split the regex string `r"[a-f]{1," f"{max}" r"}"` – LeBlue Dec 06 '20 at 19:05
  • @Timo: No, they are not escaped! My comment refers to re.escape. re.escape('.') outputs '\\.'. re.escape(':') outputs ':'. Try it out in the python console if you don't believe me. (>=3.7 as stated). – airborne Dec 22 '20 at 16:14
  • 1
    @airborne Awesome.. U explained it very well in detailed. This answer helped me a lot. Thank you so much. I appreciate you for your detailed desription. Never knew that just combining raw-string and f-string is possible and can be used with a variable. Definitely a salute to your explanation! – Priya Jul 08 '21 at 09:53
  • 1
    @airborne, do you mean "backslash" instead of "backspace" in your description of **re.escape**? – Aphoid Aug 29 '22 at 15:29
53
if re.search(r"\b(?<=\w)%s\b(?!\w)" % TEXTO, subject, re.IGNORECASE):

This will insert what is in TEXTO into the regex as a string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Bo Buchanan
  • 687
  • 4
  • 3
42
rx = r'\b(?<=\w){0}\b(?!\w)'.format(TEXTO)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Cat Plus Plus
  • 125,936
  • 27
  • 200
  • 224
9

I find it very convenient to build a regular expression pattern by stringing together multiple smaller patterns.

import re

string = "begin:id1:tag:middl:id2:tag:id3:end"
re_str1 = r'(?<=(\S{5})):'
re_str2 = r'(id\d+):(?=tag:)'
re_pattern = re.compile(re_str1 + re_str2)
match = re_pattern.findall(string)
print(match)

Output:

[('begin', 'id1'), ('middl', 'id2')]
6

I agree with all the above unless:

sys.argv[1] was something like Chicken\d{2}-\d{2}An\s*important\s*anchor

sys.argv[1] = "Chicken\d{2}-\d{2}An\s*important\s*anchor"

you would not want to use re.escape, because in that case you would like it to behave like a regex

TEXTO = sys.argv[1]

if re.search(r"\b(?<=\w)" + TEXTO + "\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
Max Carroll
  • 4,441
  • 2
  • 31
  • 31
3

you can try another usage using format grammer suger:

re_genre = r'{}'.format(your_variable)
regex_pattern = re.compile(re_genre)  
Kevin Chou
  • 489
  • 5
  • 8
2

I needed to search for usernames that are similar to each other, and what Ned Batchelder said was incredibly helpful. However, I found I had cleaner output when I used re.compile to create my re search term:

pattern = re.compile(r"("+username+".*):(.*?):(.*?):(.*?):(.*)"
matches = re.findall(pattern, lines)

Output can be printed using the following:

print(matches[1]) # prints one whole matching line (in this case, the first line)
print(matches[1][3]) # prints the fourth character group (established with the parentheses in the regex statement) of the first line.
jdelaporte
  • 127
  • 1
  • 9
1

here's another format you can use (tested on python 3.7)

regex_str = r'\b(?<=\w)%s\b(?!\w)'%TEXTO

I find it's useful when you can't use {} for variable (here replaced with %s)

Ardhi
  • 2,855
  • 1
  • 22
  • 31
0

You can use format keyword as well for this.Format method will replace {} placeholder to the variable which you passed to the format method as an argument.

if re.search(r"\b(?=\w)**{}**\b(?!\w)".**format(TEXTO)**, subject, re.IGNORECASE):
    # Successful match**strong text**
else:
    # Match attempt failed
Haneef Mohammed
  • 1,094
  • 11
  • 11
-1

more example

I have configus.yml with flows files

"pattern":
  - _(\d{14})_
"datetime_string":
  - "%m%d%Y%H%M%f"

in python code I use

data_time_real_file=re.findall(r""+flows[flow]["pattern"][0]+"", latest_file)
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
Nikolay Baranenko
  • 1,582
  • 6
  • 35
  • 60