I am trying to get some text extraction using regex in python. The regex is quite complicated being build on the fly depending on the language and the best way to go about it is compose it adding different parts.
This is the present code:
# reproduction of the problem in small scale
num = fr"""(\d\d)([A-Z])?"""
sep = fr"""and |or |, """
#pattern composition
pattern = fr"""((({num})({sep}{num})+)|({num}))"""
text= """biscuits 10 are good
biscuits 20 and 30 are good
biscuits 40 and hot dog are good
but this one 50A and 50B and not ok"""
refs = re.finditer(pattern, text, re.VERBOSE,)
for ref in refs:
TEXT = ref.group(0)
print(TEXT)
that gives all the hits separately:
my desire outcome is THE WHOLE MATCH
10
10 and 20
40
50A and 50B
Basically the num
is an expression that can appear alone or in combination with others separated by sep
.
Of course if num
is followed by sep
but not again a num
only num
should be matched.
Anyone knowing how to modify that code to achieve the solution?