-2

How do I write the Regular Expression in PHP to match HTML <p> that are AFTER the first <H1> tag?

For example the following states if not equal to the expression

if(!preg_match_all('#<p(.*?)<\/p>#', $page_content, $matches)
Jay Blanchard
  • 34,243
  • 16
  • 77
  • 119
StoryTech
  • 27
  • 1
  • 5

1 Answers1

0

In properly written HTML (i.e HTML that isn't designed to break all sorts of parsers by abusing the loopholes in SGML specification), all <h1>s will have corresponding closing tags. That means you can simply look for a <p> preceded by a </h1>.

<\/h1>[\s\S]*?<p>([\s\S]*?)<\/p>

Here's how the above regex works, and a proof of concept:

  • <\/h1> matches </h1> literally
  • [\s\S]*? matches all characters until the next <p>
  • <p> matches <p> literally
  • ([\s\S]*?) matches all characters until the next </p> (note the capturing group - this group contains what you want)
  • <\/p> matches </p> literally
Community
  • 1
  • 1
The SE I loved is dead
  • 1,517
  • 4
  • 23
  • 27