How do I write the Regular Expression in PHP to match HTML <p>
that are AFTER the first <H1>
tag?
For example the following states if not equal to the expression
if(!preg_match_all('#<p(.*?)<\/p>#', $page_content, $matches)
How do I write the Regular Expression in PHP to match HTML <p>
that are AFTER the first <H1>
tag?
For example the following states if not equal to the expression
if(!preg_match_all('#<p(.*?)<\/p>#', $page_content, $matches)
In properly written HTML (i.e HTML that isn't designed to break all sorts of parsers by abusing the loopholes in SGML specification), all <h1>
s will have corresponding closing tags. That means you can simply look for a <p>
preceded by a </h1>
.
<\/h1>[\s\S]*?<p>([\s\S]*?)<\/p>
Here's how the above regex works, and a proof of concept:
<\/h1>
matches </h1>
literally[\s\S]*?
matches all characters until the next <p>
<p>
matches <p>
literally([\s\S]*?)
matches all characters until the next </p>
(note the capturing group - this group contains what you want)<\/p>
matches </p>
literally