37

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example:

re.search("(\w)*", "abcdefg").groups()

this returns the list ('g',)

I need it to return ('a','b','c','d','e','f','g',)

Is that possible? How can I do it?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
John B
  • 3,391
  • 5
  • 33
  • 29

2 Answers2

41
re.findall(r"\w","abcdefg")
Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
33

In addition to Douglas Leeder's solution, here is the explanation:

In regular expressions the group count is fixed. Placing a quantifier behind a group does not increase group count (imagine all other group indexes increment because an eralier group matched more than once).

Groups with quantifiers are the way of making a complex sub-expression atomic, when there is need to match it more than once. The regex engine has no other way than saving the last match only to the group. In short: There is no way to achieve what you want with a single "unarmed" regular expression, and you have to find another way.

Community
  • 1
  • 1
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 2
    As an addition: Modern regex implementations like the one in .NET allow you to access previous occurrences of a group besides the last one. Therefore, the above statement is not univerally true, but still holds for the most implementations. – Tomalak Jun 07 '11 at 18:21
  • 4
    For the record, there's a regex implementation for Python which also permits access to all of the matches of a capture group: http://pypi.python.org/pypi/regex – MRAB Sep 03 '12 at 01:18