4

So I spent a lot of time on another stack overflow question, and the same problem came up with a previous one. Non-capturing groups aren't working as I'd expect them to, or so I believe.

This is a silly example along the lines of someone else's CSS test string...

Here's my regex:

(?:(rgb\([^)]*\)|\S+)(?:[ ]+)?)*

And here's the test string:

1px solid rgb(255, 255, 255) test rgb(255, 255, 255)

I'm expecting match groups of "1px","solid", "rgb(255, 255, 255)", "test", "rgb(255, 255, 255)"

But I'm only getting the last token matched.

This is the link for testing:

http://regex101.com/r/pK1uG7

What's going wrong here? I thought I had non-capturing groups down, and the way it's explained at the bottom of regex101 makes sense, including the "greediness".

sdanzig
  • 4,510
  • 1
  • 23
  • 27

2 Answers2

3

The capture group overrides each previous match. Capture group #1 first matches "1px", then capture group #1 matches "solid" overwriting "1px", then it matches "rgb(255, 255, 255)" overwriting "solid", etc.

Graeme
  • 377
  • 2
  • 11
  • Darn it, regexes would have been much more fun if you could repeat capturing groups. You're right though. Oh well. – sdanzig Oct 25 '13 at 22:48
  • There are a lot of different rules for different regex languages. You might be able to in one of them: http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines – Plasmarob Oct 25 '13 at 22:50
  • As others have said, you can use the g (global) option to accomplish what your trying to do, the problem isn't to do with the non-capturing groups. – Graeme Oct 25 '13 at 22:56
  • @graeme, /g isn't a silver bullet though. You'd have a lot more flexibility if you can specify how many matches of a group to capture. For instance, one of these, two of these, and 3-5 of those. While you can capture a repeated expression, you then have to further parse the sub-expression on a second pass. Which is understandable, but it's the poop in my regex party. – sdanzig Oct 25 '13 at 23:01
  • 1
    @sdanzig Only .NET can match repeated groups, see this [extensive answer](http://stackoverflow.com/a/17004406). I've also [stumped onto this comment](http://stackoverflow.com/questions/13587023/capture-multiple-subgroups-of-repeated-group#comment18631689_13587023) which says that a certain implementation of Python supports it too. – HamZa Oct 26 '13 at 09:08
  • 1
    @HamZa Well, I'd never choose .NET willingly, but that Python library makes me like Python just a bit more :) But I don't want to make a habit of something that's generally not supported. Regex muscle-memory should be portable. I also worry about performance implications of supporting that. – sdanzig Oct 26 '13 at 14:36
2

For this you would want to use the global option:

/(rgb\([^)]+\)|\S+)/g

http://regex101.com/r/kF2uV4

Non-capturing groups eliminate their results from the groups. So if you want to match:

"1px","solid", "rgb(255, 255, 255)", "test", "rgb(255, 255, 255)"

Then you don't want to use capturing groups that way.

See: What is a non-capturing group? What does a question mark followed by a colon (?:) mean?

See the answer of Ricardo Nolde at the top. You're eliminating the ones you say you want back.

Community
  • 1
  • 1
Plasmarob
  • 1,321
  • 12
  • 20
  • Yeah, my misunderstanding didn't have anything to do with non-capturing groups. @graeme couldn't have been any more on point than he was :) – sdanzig Oct 25 '13 at 22:50
  • 1
    I'd kill my post, but you're title header might bring people that need it. It's a good question you asked. – Plasmarob Oct 25 '13 at 22:52