now before you prepare to right a speech about the perils of HTML parsing with regex, I already know it. This is more just a curiosity question, than actually wanting to know the question for practical usage.
Basically, given a file of HTML in some random, but perfectly valid format, can you parse out the content of <p>
tags using a half-sane number of regular expressions? (and also pretending that <p>
tags can not be nested or some other minor limitation)
` contains any nested tags, then it's relatively simple. Just have to strip all comments, script and such like, then find matching `
` tags. If the HTML is not valid, then it can be very difficult.
– Orbling Jan 07 '11 at 02:01