I am trying to find the following regular expressions to implement to a program of mine to parse a given html file. Could you help me with any of those?
<div>
<div class=”menuItem”>
<span>
class=”emph”
Any string beginning with < and ending with >, i.e. all tags.
The contents of the body tag.
The contents of all divs
All divs that make menus
I have managed to figure out that the single div tag is simply " < div >"
and the "all tags expression is <(\"[^\"]*\"|'[^']*'|[^'\">])*>
Do you think you could help me with any of the rest? Thank you in advance guys...
I know that HTML parsing is an already solved problem and that regex is not efficient, however it is requested that I do this like this, in order to demonstrate how regular expressions can work by making them (sometimes) long and detailed. That's why I'm simply handling the HTML file I have as a simple text file and I need to apply those regular expressions on it.