why does the first alternation not match?

Question

I have the following regular expression (Python) that I don't understand at the following point. Why doesn't it match the first alternation, too?

Regex (spaced for better understanding):

(?:
  \$\{
    (?P<braced>
       [_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
    )
  \}
)
|   ### SECOND ALTERNATION ###
(?:
  \$
   (?P<named>
     [_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
   )
)

Test String:

asdasd $asd:sd + ${asd123:asd} $HOME $$asd

Matched stuff:

asdasd $asd:sd + ${asd123:asd} $HOME $$asd

According to the regex pattern above, the first alternation should also appear, namely:

${asd123:asd}

It seems I don't quite get the alternation pattern?

Do you mean you want to capture `${...}`, too? Have a look at https://regex101.com/r/uR4hJ9/1. You have non-capturing groups `(?:)`, when you remove the `?:`, you will turn on capturing. — Wiktor Stribiżew, May 05 '15 at 11:09
Not yet : I dont get the difference, it seems that only newlines changes the syntax? and you removed the (?: -> non-capturing group? I would like to understand it — Gabriel, May 05 '15 at 11:14
There is a good post on non-capturing groups: http://stackoverflow.com/questions/3512471/non-capturing-group — Wiktor Stribiżew, May 05 '15 at 11:16
where is the difference, to yours, its the same but it does not match?: https://www.regex101.com/r/eL1lH0/1 — Gabriel, May 05 '15 at 11:20
I posted my answer. Please read the information about non-capturing groups at regular-expressions.info, the main point is that we can make capturing groups optional with non-capturing groups. Making them capturing, we can extract/find/match data we need. — Wiktor Stribiżew, May 05 '15 at 11:21
@Gabriel: The difference is the flag `x` which allows free-spacing so that you can format your regex. I don't think there is a need to remove the capturing group, since it seems that the point of the regex is to ignore the `$` and the surrounding `${}`. — nhahtdh, May 05 '15 at 11:24
Your question is a little bit confusing. Are you also having trouble getting all matches in python or just problems with the display on the website? — RedX, May 05 '15 at 11:26
yes, i still have trouble matching the following in python, I thought of asking about a stupid regex is a too stupid question, so I posted this question whose answer is straight forward simple. the related question i posted here: http://stackoverflow.com/questions/30053885/custom-python-template-string — Gabriel, May 05 '15 at 13:10

score 2 · Accepted Answer · edited May 23 '17 at 12:14

In order to capture ${...}, you need to remove ?: to turn non-capturing groups into capturing ones. You can make them named as well. Also [_a-zA-Z0-9] is equal to \w, thus we can shorten your regex a bit:

(?P<Alternation1>
 \$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
 \}
 )
 |
 (?P<Alternation2>
  \$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
 )
)

Have a look at the demo. This regex requires the use of x option (and g options on regex101.com to show all matches, in Python, you'd use findall or finditer).

More information about non-capturing groups is available on SO and at regular-expressions.info.

To just get all matches in Python, you can use finditer like this:

import re
p = re.compile(ur'''(?P<Alternation1>
     \$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
     \}
     )
     |
     (?P<Alternation2>
      \$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
     )
    )
''', re.VERBOSE)
test_str = u"asdasd $asd:sd + ${asd123:asd} $HOME $$asd"

print [x for x in re.findall(p, test_str)]

See IDEONE demo

Casimir et Hippolyte · Answer 2 · 2015-05-05T11:58:56.587

Your pattern works well, all you need is to use it with finditer to perform a global research and obtain the whole match:

>>> for m in re.finditer(pattern, text):
...     print 'whole match: %s' (m.group(0))
...     print 'group "braced": %s' % (m.group('braced'))
...     print 'group "named": %s\n' % (m.group('named'))

(The problem with findall (that performs a global research too) is that when you have capture groups in the pattern, the result contains only a list of capture groups contents and no more the whole match result. So enclosing all in a capture group as suggested by stribizhev can be a way with findall).

Konstantin · Answer 3 · 2015-05-05T11:28:18.873

0

You need to add g modifier to get all matches on regex101.com

https://www.regex101.com/r/nP8pK0/1

edited May 05 '15 at 11:28

answered May 05 '15 at 11:17

Konstantin

24,271
5
48
65

1

Maybe that's how it works in other languages, but Python's regex syntax doesn't work that way. – user2357112 May 05 '15 at 11:18
You are correct, there is no `g` modifier in Python regular expressions, because we use `findall` to get all matches. OP's question is about not getting a match on regex101.com site – Konstantin May 05 '15 at 11:22
3

This is only good for demonstration at regex101, no actual use in Python. – Wiktor Stribiżew May 05 '15 at 11:22
Thanks for the help, I was basically interessted how to match the following http://stackoverflow.com/questions/30053885/custom-python-template-string – Gabriel May 05 '15 at 13:06

why does the first alternation not match?

3 Answers3