1

I am trying to write my own syntax highlighter in sublime. I think it uses python-based regular expression. Just want to match all tokens in a row like:

description str.bla, str.blub, str.yeah, str.no

My regular expression looks like:

regex = "(description) (str\\.[\\w\\d]+)(,\\s*(str\\.[\\w\\d]+))*"

Now I expect 1 matches in group 1 ("description"), 1 match in group 2 ("str.bla") and 3 matches in my group no 4 ("str.blub", "str.yeah", "str.no")

but I have only 1 match in my last group ("str.no"). What's going on there?

Thanks a lot!

John Rumpel
  • 4,535
  • 5
  • 34
  • 48

2 Answers2

1

Try this:

regex = "(description) (str\\.[\\w\\d]+)((?:,\\s*(?:str\\.[\\w\\d]+))*)"
Alex Filipovici
  • 31,789
  • 6
  • 54
  • 78
1

When you have a repeated capture group, (e.g. (a)* or (a)+, etc), the capture group will contain only the last match.

So, if I have the regex:

(123\d)+

And the string:

123412351236

You will find that the capture group will contain only 1236.

I don't know any way around this (besides hard coding the number of subgroups to capture), but you can try capturing the whole group like so:

regex = "(description) (str\\.[\\w\\d]+)((?:,\\s*(?:str\\.[\\w\\d]+))*)"

Which should give you

['description', 'str.bla', ', str.blub, str.yeah, str.no']

Note how the elements are grouped; you have 3 items in the list, the last one being a 'list' within the larger list.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Thanks, this is a nice workaround. By thy way: ?: (=not capture group) is not implemented in sublime – John Rumpel Aug 14 '13 at 14:38
  • 1
    @JohnRumpel Hmm, this means you'll be getting multiple captures, some of which won't be relevant. See [how it will be working](http://www.regex101.com/r/bS8tF9). – Jerry Aug 14 '13 at 14:42