1

Is there a way (3rd party libraries are OK) to locate a regex match that envelops a specific "anchor" position without the need to iterate through the matches?

For example, we have a string with a location X. Now I want to match a specific regex, with the X being part of the captured area.

My specific scenario is about non-break conditions for punctuation marks. E.g. we have a . character, and I want to ignore it for [0-9]+[.][0-9]+.

As the string may be quite long, I need an efficient way of doing it, without the need to check several matches until I'm at the right spot. The maximum length of the interval between the start of the match and the X is unknown.

Of course, iterating through the matches is also possible but it's not efficient because while the number of the punctuation marks is limited, the number of non-break conditions may be quite high.

Vadim Berman
  • 1,932
  • 1
  • 20
  • 39
  • Use iterators to the relevant parts of the string. – Galik Mar 13 '18 at 06:39
  • Thanks, @Galik. If I understand you correctly though, it means mapping the matches first, and that's what I want to avoid... Even at the price of setting an approximate "maximum length". – Vadim Berman Mar 13 '18 at 06:44
  • 1
    You can't avoid the fact that the regex engine needs to examine every character until it finds some that match the pattern. – Galik Mar 13 '18 at 06:50
  • Yes - but, in theory, it can go backwards, or calculate the maximum number of characters (not for all regexes but at least some). No? – Vadim Berman Mar 13 '18 at 06:51
  • Honestly, I was hoping for a some kind of a secret switch but it looks like I'll simply have to set that maximum length. It's possible that someone will come up with a magic option but I'm not holding my breath :) . – Vadim Berman Mar 13 '18 at 06:53
  • If `X` is a location, how could it be part of a regex? – revo Mar 13 '18 at 10:09
  • I mean that the location is part of the area captured by the regex. Is that better? – Vadim Berman Mar 13 '18 at 14:38
  • Could you provide sample input? – Thomas Ayoub Mar 13 '18 at 15:05
  • Sure. "Blah blah blah I am saying something. Then I'm saying something else **2.456** and then I'm finishing the text." The relevant period / full stop is somewhere in the middle, and the non-break is only relevant for the "2.456" substring. – Vadim Berman Mar 14 '18 at 09:45

0 Answers0