0

Setup

I create dynamically a list of regex, namely regex_list. Each regex in the list does for sure at least one match on the text to which is applied. It may happens that some regex in the list are equals.

regex_list = []
for f in foo: # foo is a list of strings e.g. foo = ['foo1', 'foo2', 'foo1', ...]
    # f is a valid expression to be used inside the regex
    regex_list.append(f'[^.]*?{f}[^.]*\.')

regex = re.compile('|'.join(regex_list), flags=re.DOTALL)
result = re.findall(regex, text)

Problem

Since

  1. some regex in regex_list may be equals
  2. regex in regex_list are combined together with the OR operator

for the regex for which exists another copy in the list, only the first match in the text is captured.

Question

A workaround could be to apply each regex individually with a for-loop, but it is very slow.

Is there a good way to combine regex and make them match everything possible?

albero
  • 169
  • 2
  • 9
  • What have you got inside `foo`? A couple of examples would do. However, it seems you just have independent regexps that need to be executed one by one on a text. – Wiktor Stribiżew Apr 22 '21 at 11:34
  • ```foo``` is a list of strings. e.g. ```foo = ['foo1', 'foo2', 'foo1', ...]``` – albero Apr 22 '21 at 12:03
  • Like `['John', 'John Doe', 'John Doe Junior']`? – Wiktor Stribiżew Apr 22 '21 at 12:04
  • Yes, I've added an example – albero Apr 22 '21 at 12:06
  • 1
    Ok, you will be able to use [this approach](https://stackoverflow.com/a/42789508/3832970). Probably, with [overlapping regex](https://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp), you can get closer. However, `[^.]*?{f}[^.]*\.` pattern means the `[^.]` might just eat any other potential matches. You will need to post-process matches, I am afraid. – Wiktor Stribiżew Apr 22 '21 at 12:07

1 Answers1

0

Casually discovered that applying each regex individually in a for-loop is very slow using the re module, while it's surprisingly faster using the regex module.

albero
  • 169
  • 2
  • 9