1

I have several substrings which I want to search for, let's say: substrings = ['ABC', 'ABCDE']. I also have several strings like 'xyzABCxyz', 'xyzABCDExyz'. My regex pattern provided to re.search() is: '(%s)' % '|'.join(substrings). Now, searching for either in 'xyzABCxyz' works just fine, but in 'xyzABCDExyz' all I get is 'ABC'.

My question is: what should I do to make regex not stop at finding the shortest substring? All that comes to my mind is to change the substrings order but I'm looking for something more elegant.

Anna
  • 271
  • 1
  • 3
  • 10
  • 1
    You could sort substrings array by elements' length and then join the elements. Or you could use lookarounds. – fardjad May 25 '18 at 13:56
  • 1
    Just sort the substrings descending by size, then iterate and use `re.search`, stopping with the first match, which would also coincide with the longest substring. There are also non regex solutions here. – Tim Biegeleisen May 25 '18 at 13:57
  • 2
    Not sure it's related to your issue. But do note that the OR in a regex capture group doesn't do things in a greedy way. F.e. `/(foo|foobar)/` would only match "foo" in "going to the foobar.". But `/(foobar|foo)/` would match "foobar" – LukStorms May 25 '18 at 13:59
  • 1
    There wouldn't be a more elegant way to do it. Just desc sort it by length. – revo May 25 '18 at 14:15

0 Answers0