0

Assume that we have a string such as '{ CAN_READ, CAN_WRITE }' and '{ CAN_READ, CAN_WRITE, CAN_REMOVE }' and want to extract the elements (CAN_READ, CAN_WRITE, CAN_REMOVE). We assume that the number of elements can be any. I am trying to solve this with Python's regular expression module (re).

The regular expression I designed is like this: r'^\{(\s*[a-zA-Z0-9_]+\s*,?)+\s*\}$'

I think this regexp is correct as re.match works. However, although I expect we can get elements with the groups() method of the result, it returns only the last match.

e.g.

>>> value='{ CAN_READ, CAN_WRITE }'
>>> re.match(r'^\{(\s*[a-zA-Z0-9_]+\s*,?)+\s*\}$', value).groups()
(' CAN_WRITE ',)
>>> value='{ CAN_READ, CAN_WRITE, CAN_REMOVE }'
>>> re.match(r'^\{(\s*[a-zA-Z0-9_]+\s*,?)+\s*\}$', value).groups()
(' CAN_REMOVE ',)

In order to test the block part (\s*[a-zA-Z0-9_]+\s*,?) is correct, I repeated this block twice, and it worked:

>>> value='{ CAN_READ, CAN_WRITE }'
>>> re.match(r'^\{(\s*[a-zA-Z0-9_]+\s*,?)(\s*[a-zA-Z0-9_]+\s*,?)\s*\}$', value).groups()
(' CAN_READ,', ' CAN_WRITE ')

However this works only when the number of elements is two.

How can I get all repeated blocks?

martineau
  • 119,623
  • 25
  • 170
  • 301
Akihiko
  • 362
  • 3
  • 14
  • 2
    As usual: if you can install PyPi regex library, use `match.captures(1)`, else, do it in two steps: a) validate with `re.fullmatch`, then 2) extract the matches using either another regex or using split + a bit of string manipulation code. If you do not care about the string format validity, use `re.findall()` with an unanchored pattern that matches what is inside the braces (here, `re.findall(r'\w+', text)`). – Wiktor Stribiżew Jul 28 '21 at 17:52
  • 1
    Use [`re.findall()`](https://docs.python.org/3/library/re.html#re.findall). – martineau Jul 28 '21 at 17:52
  • I tried both the two step approach and `re.findall()` and both worked. `re.findall()` looks easier in my case. Thanks! – Akihiko Jul 28 '21 at 17:56

0 Answers0