Php: How to ignore newline in Regex

Question

I've already found a lot of stackoverflow questions about this topic. But I cannot find out the solution out of these questions for my problem.

I have the following html:

<p><a name="first-title"></a></p>
<h3>First Title</h3>
<h2><a href='#second'>Second Title</a></h2>
<h3>Third Title</h3>

I want to find out the <h3> prepended by </a></p>. In this case, the output should be:

<h3>First Title</h3>

So I implement the following regular expression;

preg_match_all('/(?<=<\/a><\/p>)<h3>(.+?)<\/h3>/s',$html,$data);

The above regular expression cannot output anything from the above html. But if I remove the newlines from the html, the above regular expression can correctly output my desire result.

I would not like to remove newlines from the html if possible. How should I develop regular expression to ignore the newlines from the source string?

Please, help me.

Read this http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. Regexes are NOT the way to parse HTML — Jojodmo, Jun 28 '15 at 22:00

Avinash Raj · Accepted Answer · 2015-06-28T16:49:02.993

4

Here comes the use of \K, since you can't use qunatifiers inside the lookaround assertions.

preg_match_all('/<\/a><\/p>\s*\K<h3>(.+?)<\/h3>/s',$html,$data);

or just put \n char inside the lookbehind.

preg_match_all('/(?<=<\/a><\/p>\n)<h3>(.+?)<\/h3>/s',$html,$data);

edited Jun 28 '15 at 16:49

answered Jun 28 '15 at 16:45

Avinash Raj

172,303
28
230
274

Php: How to ignore newline in Regex

1 Answers1