I have a multiple patterns regex, it works fine except it matches a redundant pattern in the tuple, if I run the below code:
import re
re1 = 'SENT: (\w+)\_\d{4}(\d+)'
re2 = 'SENT: (\w+)\s\w*\s\w{4}(\d{4})'
re3 = 'SENT: (\w+)\s\w+\s(\d{4})'
sentences = ['SENT: xyz File 20210630.csv', 'SENT: xyz_20210630_Details.csv', 'SENT: xyz File 070121.txt']
for sentence in sentences:
generic_re = re.compile("(%s|%s|%s)" % (re1, re2, re3)).findall(sentence)
print(generic_re)
OUTPUT :
[('SENT: xyz File 20210630', '', '', 'xyz', '0630', '', '')]
[('SENT: xyz_20210630', 'CAP', '0630', '', '', '', '')]
[('SENT: xyz File 0701', '', '', '', '', 'STLB', '0701')]
'SENT: xyz File 20210630'& '' is the redundant part, how to get rid of it and stick with these two groups (xyz) and (0630) in the output.