regex: how to get repeating blocks as groups()?

Question

Assume that we have a string such as '{ CAN_READ, CAN_WRITE }' and '{ CAN_READ, CAN_WRITE, CAN_REMOVE }' and want to extract the elements (CAN_READ, CAN_WRITE, CAN_REMOVE). We assume that the number of elements can be any. I am trying to solve this with Python's regular expression module (re).

The regular expression I designed is like this: r'^\{(\s*[a-zA-Z0-9_]+\s*,?)+\s*\}$'

I think this regexp is correct as re.match works. However, although I expect we can get elements with the groups() method of the result, it returns only the last match.

e.g.

>>> value='{ CAN_READ, CAN_WRITE }'
>>> re.match(r'^\{(\s*[a-zA-Z0-9_]+\s*,?)+\s*\}$', value).groups()
(' CAN_WRITE ',)
>>> value='{ CAN_READ, CAN_WRITE, CAN_REMOVE }'
>>> re.match(r'^\{(\s*[a-zA-Z0-9_]+\s*,?)+\s*\}$', value).groups()
(' CAN_REMOVE ',)

In order to test the block part (\s*[a-zA-Z0-9_]+\s*,?) is correct, I repeated this block twice, and it worked:

>>> value='{ CAN_READ, CAN_WRITE }'
>>> re.match(r'^\{(\s*[a-zA-Z0-9_]+\s*,?)(\s*[a-zA-Z0-9_]+\s*,?)\s*\}$', value).groups()
(' CAN_READ,', ' CAN_WRITE ')

However this works only when the number of elements is two.

How can I get all repeated blocks?

As usual: if you can install PyPi regex library, use `match.captures(1)`, else, do it in two steps: a) validate with `re.fullmatch`, then 2) extract the matches using either another regex or using split + a bit of string manipulation code. If you do not care about the string format validity, use `re.findall()` with an unanchored pattern that matches what is inside the braces (here, `re.findall(r'\w+', text)`). — Wiktor Stribiżew, Jul 28 '21 at 17:52
Use [`re.findall()`](https://docs.python.org/3/library/re.html#re.findall). — martineau, Jul 28 '21 at 17:52
I tried both the two step approach and `re.findall()` and both worked. `re.findall()` looks easier in my case. Thanks! — Akihiko, Jul 28 '21 at 17:56

regex: how to get repeating blocks as groups()?

0 Answers0