0

I want to catch these patterns

GH_AP_FB
RB_GO_PY_YL_MI
GB

The general regex rule is ([A-Z]{2})(_[A-Z]{2})*

How can I implement this in Python and also have access to each [A-Z]{2} element ?

Also, what about patterns with more depth (nested groups), ex:

GH_AP_FB-RB_GO_PY_YL_MI-GB
GH_AP-RB_GO_MI-GB_TT-HG-RT-KK_LL_MM
FB

--> [ [GH, AP, FB], [RB, GO, PY, YL, MI], [GB] ]
--> [ [GH, AP], [RB, GO, MI], [GB, TT], [RT], [KK, LL, MM] ]
--> [ [FB] ]

Is there a better way from using regex, like parser with easy to implement rules ?

[EDIT]
Please, no answers using str.split(), because the separators "_" and "-" may also be used before or after this patterns in a wider string.

  • Why `re.match`? `if re.fullmatch(r'[A-Z]{2}(?:_[A-Z]{2})*', text): print(text.split('_'))` will do if you need to validate the format and then get all items between underscores. If you need to support `-[A-Z]{2}` optional parts, use `r'[A-Z]{2}(?:-[A-Z]{2})?(?:_[A-Z]{2}(?:-[A-Z]{2})?)*'` – Wiktor Stribiżew Mar 10 '20 at 15:13
  • Match either a `-` or a `_` like `[A-Z]{2}(?:[_-][A-Z]{2})*` https://regex101.com/r/5LPPpi/1 – The fourth bird Mar 10 '20 at 15:14
  • Then you must use PyPi `regex` module to match repeated capturing groups. – Wiktor Stribiżew Mar 10 '20 at 15:18

0 Answers0