I'm looking for the html end tag in an mhtml file. The html is in fixed-width rows with a line break at the end like this:
size:12pt">Insert an image into the document here.</span></p><p style=3D"ma=
rgin:0pt 0pt 3pt; text-align:center"><img src=3D"image.001.png" width=3D"20=
0" height=3D"200" alt=3D"" /></p><p style=3D"margin:0pt 0pt 3pt"><span styl=
e=3D"font-family:Arial; font-size:12pt"> </span></p></div></body></htm=
l>
Notice the </html> end tag is split in the middle by "=\n".
How can I find the </html> end tag regardless of where it is split?
I can find a single permutation using Regex similar to the following, but I'd like to do it in one shot.
<((=\n)?/html>)
</((=\n)?html>)
</h((=\n)?tml>)
</ht((=\n)?ml>)
etc...
I've read RegEx match open tags except XHTML self-contained tags and read the post at http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html among others, but I still think the question is valid.
I'm not making an html parsing engine. I'm just looking for one very specific pattern. And... this has to go out tomorrow. All great reasons to do this down and dirty solution >:D