I have records being read from a file which are strings of data that I'd like to break into sections. I new section always begins with <xxx>
where xxx
is any three alphabetic characters. Each section can be a different length.
Listed below is a sample snippet of the data
<AAA>q2w *dc<BBB>12sd<CCC>wer(4rf) q w ddcd<DDD> w erdfWED#2w
Regardless of the pattern I use, I can't get the string to break as i'd like. I either get the entire string, or just the section identifier (<xxx>
) and the very next character.
Listed below are a few patterns that i've tried with the results immediately following:
matchLn1 = re.findall('(<\w{3}>.*)','<AAA>q2w *dc<BBB>12sd<CCC>wer(4rf) q w ddcd<DDD> w erdfWED#2w')
['<AAA>q2w *dc<BBB>12sd<CCC>wer(4rf) q w ddcd<DDD> w erdfWED#2w']
matchLn1 = re.findall('(<\w{3}>.*?)','<AAA>q2w *dc<BBB>12sd<CCC>wer(4rf) q w ddcd<DDD> w erdfWED#2w')<br/>
['<AAA>', '<BBB>', '<CCC>', '<DDD>']
matchLn1 = re.findall('(<\w{3}>.+?)','<AAA>q2w *dc<BBB>12sd<CCC>wer(4rf) q w ddcd<DDD> w erdfWED#2w')<br/>
['<AAA>q', '<BBB>1', '<CCC>w', '<DDD> ']
matchLn1 = re.findall('(<\w{3}>.?)','<AAA>q2w *dc<BBB>12sd<CCC>wer(4rf) q w ddcd<DDD> w erdfWED#2w')<br/>
['<AAA>q', '<BBB>1', '<CCC>w', '<DDD> ']
I tried a few other patters as well, but the outcome was always the same. Any/all thoughts would be most welcome.
thank you