While experimenting with regexs in python's re.findall, I came across this problem:
line = "Lorem ipsum HELLO dolor sit amet, GOODBYE consectetuer adipiscing elit, HELLO sed diam nonummy nibh GOODBYE all"
X = re.findall("(HELLO)(.*)(GOODBYE)", line, flags=re.MULTILINE)
print (y)
This will output:
('HELLO', ' dolor sit amet, GOODBYE consectetuer adipiscing elit, HELLO sed diam nonummy nibh ', 'GOODBYE')
But what I wanted was more like...
[('HELLO', ' dolor sit amet', 'GOODBYE'), ('HELLO', 'sed diam nonummy nibh ', 'GOODBYE')]
So instead of taking them one at a time, re.findall (based upon the way I have it defined the pattern) seems to be looking for the first and last occurrences of HELLO and GOODBYE to define the list elements, and it then places everything else in between into the middle group.
Is there a way to get it how I was seeking it? I thought maybe "serializing" the HELLO and GOODBYE pairs might help, sort of like this:
line = "Lorem ipsum HELLO_1 dolor sit amet, GOODBYE_1 consectetuer adipiscing elit, HELLO_2 sed diam nonummy nibh GOODBYE_2 all"
But that seems to make the problem harder.
Any helpful ideas most appreciated!