-2

In a string containing some html I want to find and replace every occurrence of a <h1-6> tag, including anything that follows it, up until another <h1-6> tag or until the end of the html string.

My pattern: <h\d.+?(?=<h\d)

With /gs flags this pattern works fine on this online testing tool.

However, on server side test I am only able to make my pattern match the first occurrence, while the rest are being ignored.

PHP Manual states:

Searches subject for matches to pattern and replaces them with replacement.

Another post answer mentions:

preg_replace() will perform global replacements by default

According to the above, my server side pattern should work if changed to /<h\d.+?(?=<h\d)/s, but for some reason it still only replaces the first occurrence.

Full code:

$html = get_html_string();
$pattern = '/<h\d.+?(?=<h\d)/s';
$replace = '<div>$0</div>';
$html = preg_replace($pattern, $replace, $html);
return $html;


Update:

Looks like my html example somehow differs from actual html on the website. Because of this I made sure to copy the string that i want to manipulate directly in to the online tester tool. Now it is apparent that matching works but the actual problem was that last match is not included. See this updated test.

Thanks to Nick for the answer and everyone else for chiming in.

bewww
  • 39
  • 2
  • 6

1 Answers1

0

You have a couple of issues:

  1. You need the g (global) flag to get multiple matches
  2. You need to add $ (end-of-string) as an alternation to your lookahead so that it can match up to the end of string from the last <hn> tag.

This should do what you want:

<h\d.+?(?=<h\d|$)

Demo on regex101

In PHP:

preg_match_all('/<h\d.+?(?=<h\d|$)/s', $html, $matches);
print_r($matches[0]);

Output:

Array
(
    [0] => <h1> attribute="whatever">asiponfpweg ihnasegio</h1>asd
<p>whatever</p>
<img src=""></img>
    [1] => <h3> attribute="whatever">asiponfpweg ihnasegio</h3>
<p>whatever</p>
<p>whatever</p>
    [2] => <h1><span> attribute="whatever">asiponfpweg ihnasegio</span></h3>
<p>whatever</p>
    [3] => <h3> attribute="whatever">asiponfpweg ihnasegio</h3>
)

Demo on 3v4l.org

Nick
  • 138,499
  • 22
  • 57
  • 95
  • I should have mentioned that I am trying to make this work with preg_replace. I have rewritten my question. – bewww Apr 24 '20 at 11:21
  • @bewww sorry about lack of response - it's been night time here. I guess you've got it resolved, but the same regex works fine with `preg_replace`, see https://3v4l.org/K8h0P – Nick Apr 24 '20 at 22:14