34

Why doesn't the regex (?<=fo).* match foo (whereas (?<=f).* does)?

"foo" =~ /(?<=f).*/m          => 1
"foo" =~ /(?<=fo).*/m         => nil

This only seems to happen with singleline mode turned on (dot matches newline); without it, everything is OK:

"foo" =~ /(?<=f).*/           => 1
"foo" =~ /(?<=fo).*/          => 2

Tested on Ruby 1.9.3 and 2.0.0.

See it on Rubular

EDIT: Some more observations:

Adding an end-of-line anchor doesn't change anything:

"foo" =~ /(?<=fo).*$/m        => nil

But together with a lazy quantifier, it "works":

"foo" =~ /(?<=fo).*?$/m       => 2

EDIT: And some more observations:

.+ works as does its equivalent {1,}, but only in Ruby 1.9 (it seems that that's the only behavioral difference between the two in this scenario):

"foo" =~ /(?<=fo).+/m         => 2
"foo" =~ /(?<=fo).{1,}/       => 2

In Ruby 2.0:

"foo" =~ /(?<=fo).+/m         => nil
"foo" =~ /(?<=fo).{1,}/m      => nil

.{0,} is busted (in both 1.9 and 2.0):

"foo" =~ /(?<=fo).{0,}/m      => nil

But {n,m} works in both:

"foo" =~ /(?<=fo).{0,1}/m     => 2
"foo" =~ /(?<=fo).{0,2}/m     => 2
"foo" =~ /(?<=fo).{0,999}/m   => 2
"foo" =~ /(?<=fo).{1,999}/m   => 2
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Well, lookbehind assertions *are* a new feature since version 1.9, but it's not like this is a very complicated one...makes you wonder what other bugs there are. – Tim Pietzcker Mar 05 '13 at 21:27
  • 2
    If it's a bug, it's in two different regexp engines (1.9 and 2.0.0 don't use the same engine). – Wayne Conrad Mar 05 '13 at 21:29
  • 4
    Well the Ruby 2.0 engine is Onigmo, which is a fork of Ruby 1.9's engine Oniguruma. So if it's really a bug, it may well exist in both engines going unnoticed so far. – Patrick Oscity Mar 05 '13 at 21:43
  • 2
    Well, I've opened a ticket in the Ruby bug tracker...: http://bugs.ruby-lang.org/issues/8023 – Tim Pietzcker Mar 05 '13 at 21:44
  • In Ruby, 'dot matches all' is _multiline_ mode, and there is no _singleline_ mode as such. – MikeM Mar 05 '13 at 21:57
  • 2
    @MikeM: What Ruby calls "multiline" is called "singleline" in every other regex flavor there is. This is confusing enough :) – Tim Pietzcker Mar 05 '13 at 21:59
  • Linked: [How do I create a multiline regex?](http://stackoverflow.com/questions/15233480/how-do-i-create-a-multiline-regex). – MikeM Mar 05 '13 at 23:40
  • @dbenhur: Thanks for the additional observations! I've played with them and found a difference between Ruby 2.0 and 1.9's regex engines in the `.+`/`.{1,}` variants of the regex (see above). – Tim Pietzcker Mar 06 '13 at 06:48
  • 1
    I think it would be helpful and easier to read to many others if you remove the irb/pry prompt from the code chunks, and further put the results on the same line as the code like `"foo" =~ /(?<=f).*/m # => 1`. – sawa Mar 06 '13 at 06:53
  • 1
    @sawa: Right, thanks, this was getting out of hand :) – Tim Pietzcker Mar 06 '13 at 06:58
  • @WayneConrad: It's a slightly different bug in each version. Specifically, `.+` works in 1.9 and fails in 2.0... – Tim Pietzcker Mar 06 '13 at 21:25
  • Shouldn't your comment on opening a bug tracker be the answer? And this question be closed? [Or even deleted, since it's almost obvious from the start that this is a bug?](http://meta.stackexchange.com/q/158578/171231) It's a great find, but I don't think it's a great SO question. And especially it isn't an *unanswered* one. (so -1 on the question, but +1 on the comment in an attempt to show this "question" in answered.) – Chris Wesseling Mar 07 '13 at 09:03
  • @ChrisWesseling: I agree, but I'm still waiting for any reaction from the Ruby bugtracker. So far, there has been no activity at all. Until that happens, I'm hesitant to close the question. – Tim Pietzcker Mar 07 '13 at 11:20
  • @ChrisWesseling: Also, as is evident from the edits to the question, there have been valuable contributions to the question that helped define the (fairly obvious) bug better; that's something I'm not seeing on the bugtracker either. – Tim Pietzcker Mar 07 '13 at 11:28

1 Answers1

7

This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561