2

I have the following regular expression (Python) that I don't understand at the following point. Why doesn't it match the first alternation, too?

Regex (spaced for better understanding):

(?:
  \$\{
    (?P<braced>
       [_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
    )
  \}
)
|   ### SECOND ALTERNATION ###
(?:
  \$
   (?P<named>
     [_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
   )
)

Test String:

asdasd $asd:sd + ${asd123:asd} $HOME $$asd

Matched stuff:

asdasd $asd:sd + ${asd123:asd} $HOME $$asd

According to the regex pattern above, the first alternation should also appear, namely:

${asd123:asd}

It seems I don't quite get the alternation pattern?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Gabriel
  • 8,990
  • 6
  • 57
  • 101
  • 1
    Do you mean you want to capture `${...}`, too? Have a look at https://regex101.com/r/uR4hJ9/1. You have non-capturing groups `(?:)`, when you remove the `?:`, you will turn on capturing. – Wiktor Stribiżew May 05 '15 at 11:09
  • yes, please :-), thanks :-) lets see whats different – Gabriel May 05 '15 at 11:10
  • Does that answer your question? – Wiktor Stribiżew May 05 '15 at 11:11
  • Not yet : I dont get the difference, it seems that only newlines changes the syntax? and you removed the (?: -> non-capturing group? I would like to understand it – Gabriel May 05 '15 at 11:14
  • There is a good post on non-capturing groups: http://stackoverflow.com/questions/3512471/non-capturing-group – Wiktor Stribiżew May 05 '15 at 11:16
  • where is the difference, to yours, its the same but it does not match?: https://www.regex101.com/r/eL1lH0/1 – Gabriel May 05 '15 at 11:20
  • I posted my answer. Please read the information about non-capturing groups at regular-expressions.info, the main point is that we can make capturing groups optional with non-capturing groups. Making them capturing, we can extract/find/match data we need. – Wiktor Stribiżew May 05 '15 at 11:21
  • @Gabriel: The difference is the flag `x` which allows free-spacing so that you can format your regex. I don't think there is a need to remove the capturing group, since it seems that the point of the regex is to ignore the `$` and the surrounding `${}`. – nhahtdh May 05 '15 at 11:24
  • 2
    Your question is a little bit confusing. Are you also having trouble getting all matches in python or just problems with the display on the website? – RedX May 05 '15 at 11:26
  • yes, i still have trouble matching the following in python, I thought of asking about a stupid regex is a too stupid question, so I posted this question whose answer is straight forward simple. the related question i posted here: http://stackoverflow.com/questions/30053885/custom-python-template-string – Gabriel May 05 '15 at 13:10

3 Answers3

2

In order to capture ${...}, you need to remove ?: to turn non-capturing groups into capturing ones. You can make them named as well. Also [_a-zA-Z0-9] is equal to \w, thus we can shorten your regex a bit:

(?P<Alternation1>
 \$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
 \}
 )
 |
 (?P<Alternation2>
  \$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
 )
)

Have a look at the demo. This regex requires the use of x option (and g options on regex101.com to show all matches, in Python, you'd use findall or finditer).

More information about non-capturing groups is available on SO and at regular-expressions.info.

To just get all matches in Python, you can use finditer like this:

import re
p = re.compile(ur'''(?P<Alternation1>
     \$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
     \}
     )
     |
     (?P<Alternation2>
      \$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
     )
    )
''', re.VERBOSE)
test_str = u"asdasd $asd:sd + ${asd123:asd} $HOME $$asd"

print [x for x in re.findall(p, test_str)]

See IDEONE demo

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Your pattern works well, all you need is to use it with finditer to perform a global research and obtain the whole match:

>>> for m in re.finditer(pattern, text):
...     print 'whole match: %s' (m.group(0))
...     print 'group "braced": %s' % (m.group('braced'))
...     print 'group "named": %s\n' % (m.group('named'))

(The problem with findall (that performs a global research too) is that when you have capture groups in the pattern, the result contains only a list of capture groups contents and no more the whole match result. So enclosing all in a capture group as suggested by stribizhev can be a way with findall).

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

You need to add g modifier to get all matches on regex101.com

https://www.regex101.com/r/nP8pK0/1

Konstantin
  • 24,271
  • 5
  • 48
  • 65
  • 1
    Maybe that's how it works in other languages, but Python's regex syntax doesn't work that way. – user2357112 May 05 '15 at 11:18
  • You are correct, there is no `g` modifier in Python regular expressions, because we use `findall` to get all matches. OP's question is about not getting a match on regex101.com site – Konstantin May 05 '15 at 11:22
  • 3
    This is only good for demonstration at regex101, no actual use in Python. – Wiktor Stribiżew May 05 '15 at 11:22
  • Thanks for the help, I was basically interessted how to match the following http://stackoverflow.com/questions/30053885/custom-python-template-string – Gabriel May 05 '15 at 13:06