I have a HUGE HTML document that I need to parse.
The document is a list of <p>
elements all (direct) children of the body tag.
The difference is the class name. The structure is like this:
<p class="first-level"></p>
<p class="second-level"></p>
<p class="third-level"></p>
<p class="third-level"></p>
<p class="nth-levels just-for-demo-1"></p>
<p class="nth-levels just-for-demo-1"></p>
<p class="third-level"></p>
<p class="second-level"></p>
<p class="third-level"></p>
<p class="nth-levels just-for-demo-2"></p>
<p class="first-level"></p>
<p class="second-level"></p>
<p class="second-level"></p>
<p class="third-level"></p>
And so on. nth-level can be any class name that isn't first-level
, second-level
or third-level
.
Basically it's a multi-level <ul>
element very poorly marked-up.
What I want to do is parse it and obtain all <p>
elements (including tag, not just innerHTML) that are between one of the class names above.
In the example above, I want to get:
<p class="nth-levels just-for-demo-1"></p>
<p class="nth-levels just-for-demo-1"></p>
and
<p class="nth-levels just-for-demo-2"></p>
How the heck can I do that please? Thank you.