2

I'm specifically using Ruby but I'm curious... say I'm trying to match a decimal followed by at least three digits.

Here's the regexp: /(\.\d{5,})/

Without using a negative lookbehind, how would I make this only match if it follows either A) a space or tab or newline, or B) is the start of a string?

Max
  • 597
  • 7
  • 21
  • Why are you trying to avoid negative lookbehind? Are there any other constructs you also need to avoid? – ruakh Jun 07 '17 at 05:36
  • In this case, javascript doesn't allow for negative lookbehind. – Max Jun 07 '17 at 06:41
  • But you said that you're using Ruby. If you want a JavaScript regular expression, you should ask for that, rather than for a Ruby regular expression that avoids one particular feature that Ruby regular expressions have but JavaScript regular expressions lack. – ruakh Jun 07 '17 at 06:45
  • @ruakh: His question is clear enough, he just wants to do that with Ruby flavors without using negative lookbehind. – Gawil Jun 07 '17 at 08:18
  • @Max: Look at this post, you might find something interesting : https://stackoverflow.com/questions/641407/javascript-negative-lookbehind-equivalent – Gawil Jun 07 '17 at 08:18
  • Can't it match the previous character? – SamWhan Jun 07 '17 at 10:58
  • @Gawil: My issue is that the OP's stated requirement is trivial to satisfy -- just use *positive* lookbehind (`/(?:^|(?<=[ \t\n])(\.\d{5,})/`) -- but I suspect that doing so won't satisfy his/her *real* requirement. Conversely, if the stated requirement were "write a JavaScript regex that is equivalent to the Ruby regex `/(?:^|(?<=[ \t\n])(\.\d{5,})/`", then that requirement would be literally *impossible* to satisfy, but it could nonetheless be possible to satisfy the *real* requirement, if we knew what that was. – ruakh Jun 07 '17 at 16:30
  • @Gawil My real requirement, as stated, is curiousity. I'm trying to learn more about regexp, and I wanted to know if I was missing a simple way to ignore a match if it followed a character/new line/space/start of string. – Max Jun 07 '17 at 16:32
  • @Max: in the link I posted above, there is a way to do that. It's not pure regex though, but it's close. – Gawil Jun 08 '17 at 12:52
  • can someone please answer following : https://stackoverflow.com/q/59403483/11264185 – Shivam Poojara Dec 19 '19 at 06:38

1 Answers1

4

Let's first consider how it would be done with a lookbehind. Then we just check if before what we capture is the start of the line, or a whitespace:

(?<=^|\s)(\.\d{5,})

We could simply change that lookbehind to a normal capture group.
Which means a preceding whitespace also gets captured. But in a replace we can just use or not use that capture group 1.

(^|\s)(\.\d{5,})

In the PCRE regex engine we have \K

\K : resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match

So by using that \K in the regex, the preceding space isn't included in the match

(?:^|\s)\K(\.\d{5,})

A test here

However, if you use Rubi's scan with a regex that has capture groups?
Then it seems that it only outputs the capture groups (...), but not the non-capture groups (?:...) or what's not in a capture group.

For example:

m = '.12345 .123456 NOT.1234567'.scan(/(?:^|\s)(\.\d{5,})/)
=> [[".12345"], [".123456"]]

m = 'ab123cd'.scan(/[a-z]+(\d+)(?:[a-z]+)/)
=> [["123"]]

So when you use scan, lookarounds don't need to be used.

LukStorms
  • 28,916
  • 5
  • 31
  • 45