I want to take my text below, and assemble it into a list of objects as shown below. I know this can be done with regex somehow. Please assist.
Starting html text:
peanut butter1
<ul id="ul0002" list-style="none">peanut butter2
<li id="ul0002-0001" num="0000">2.0 to 6.0 mg of 17β-estradiol and</li>
<li id="ul0002-0002" num="0000">0.020 mg of ethinylestradiol;</li>
<br>
<li id="ul0002-0003" num="0000">0.25 to 0.30 mg of drospirenone and</li>peanut butter3
</ul>peanut butter4
Desired output:
list = [
['peanut butter1', 'no tag'],
['peanut butter2', 'ul'],
['2.0 to 6.0 mg of 17β-estradiol and', 'li'],
['0.020 mg of ethinylestradiol;', 'li'],
['<br>', 'no tag'],
['0.25 to 0.30 mg of drospirenone and', 'li'],
['peanut butter3', 'no tag'],
['peanut butter4', 'no tag'],
]