So I am trying to write my own scripts that will take in html files and return errors as well as clean them (doing this to learn regex and because I find it useful)
I am starting by having a quick function that takes the document, and grabs all of the tags in the correct order so I can check to make sure that they are all closed...I use the following:
>>> s = """<a>link</a>
... <div id="something">
... <p style="background-color:#f00">paragraph</p>
... </div>"""
>>> re.findall('(?m)<.*>',s)
['<a>link</a>', '<div id="something">', '<p style="background-color:#f00">paragraph</p>', '</div>']
I understand that it grabs everything between the two carrot brackets, and that that becomes the whole line. What would I use to return the following:
['<a>','</a>', '<div id="something">', '<p style="background-color:#f00">','</p>', '</div>']