0

Is is possible to use a variable inside a regular expression such as the following?


target = ["New York", "the most"]
regex = r"((?:\w+\W+){3})(?=New York((?:\W+\w+){3}))"

test_str = "The City of New York often called New York City or simply New York is the most populous city in the " \
           "United States. With an estimated 2016 population of 8537673 distributed over a land area of about 3026" \
           "square miles (784 km2) New York City is also the most densely populated major city in the United States."

matches = re.finditer(regex, test_str)

for match in matches:
    print(re.sub(r'\W+', ' ', match.group(1))+"  <------>" +re.sub(r'\W+', ' ', match.group(2)))
re.sub(r'\W+', '', match.group(1))

Ideally I would like it to loop through each element in the "target" list above and extract the three words left and right of the phrase, which this code does, but only if the search terms is a constant in the regular expression.

Thanks in advance

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
JFDA_64
  • 21
  • 4

1 Answers1

1

You can use f-strings:

for phrase in target:
  regex = rf"((?:\w+\W+){{3}})(?={phrase}((?:\W+\w+){{3}}))"
Kelo
  • 1,783
  • 2
  • 9
  • 21