-1

I'm trying to replace all instances of paragraphs containing only a series of the same character with a separator tag.

I've used https://www.phpliveregex.com/ to test the code, copied the code straight from there onto php on my server AND on http://phptester.net/, but it will NOT work.

Sample:

$test = "<p>Sed nec convallis tortor. Aenean ante diam, aliquet eget porta in, cursus a nibh. Suspendisse eu tempus sem, sit amet malesuada arcu. Nunc condimentum a elit eget elementum. Curabitur id erat et dolor mattis luctus id id massa.</p>
<p>XXXXXXX</p>
<p><em>Nulla vel ligula arcu. Vivamus nec nisi sit amet dui vulputate suscipit.</em></p>
<p><em>Suspendisse finibus lectus ut elit molestie, ornare accumsan lacus accumsan.</em></p>
<p><em>Fusce vel blandit dolor, ac imperdiet purus.</em>.</p>";

echo preg_replace("/<p>(.)\1{3,}<\/p>/i", "<hr />", $test);

This will still output the <p>XXXXXXX</p> line, not the intended <hr />.

Any ideas? Tips?

dudah84
  • 19
  • 2

2 Answers2

1

I was able to fix it by replacing your double quotes with single quotes.

echo preg_replace('/<p>(.)\1{3,}<\/p>/i', '<hr />', $test);

After a bit more testing with your original code will also work if you use two \\ for the 1 in your pattern. Like so.

echo preg_replace("/<p>(.)\\1{3,}<\/p>/i", "<hr />", $test);

I believe that this is because using double quotes is treating the \1 as an escaped 1. Using two \\ before the one escapes the backslash making it literal thus achieving the desired instruction of \1. Something like that. My regex is not all that good but I think that is what is going on. You can avoid all this by using single quotes around your pattern.

Joseph_J
  • 3,654
  • 2
  • 13
  • 22
  • why not just escape the `<` in the `
    `? seems to be the issue here.
    – Funk Forty Niner Sep 22 '18 at 02:12
  • @Funk Forty Niner I tried your suggestion with his original code and was not able to get it to work. `preg_replace("/

    (.)\1{3,}<\/p>/i", "\


    ", $test);` did not work for me.
    – Joseph_J Sep 22 '18 at 02:20
  • 1
    Ah ok. Sorry, I deleted my comment. I thought it was the OP's answer LOL!. Ok, well I was sure that was it. Oh well, can't get them all. – Funk Forty Niner Sep 22 '18 at 02:23
  • I was curious to see if that would do it. Was trying to learn something from it. I think it has to do with the double quotes not working on the backslash `\1` in the pattern. – Joseph_J Sep 22 '18 at 02:26
  • Now that's where it stops for me, the "pattern". I know some regex but not enough to offer a concrete answer. One of the characters that is widely used in PHP is the "less than" symbol and requires special attention when it's needed in a search pattern of sorts. That's all I know when I need to escape them but I never had to, so I just prepare, bind, execute and make sure all passwords have been dealt with properly. *Cheers* – Funk Forty Niner Sep 22 '18 at 02:33
  • Yeah my regex is not all that good either. I think I figured out the reason though... ~Cheers! – Joseph_J Sep 22 '18 at 02:40
  • Yeah saw that. Both answers look good to me, least from what I read :-) Ok, tying up comments. Ciao for now. – Funk Forty Niner Sep 22 '18 at 02:41
  • Joseph, that solved it, thank you! :) – dudah84 Sep 22 '18 at 03:34
  • No problem, good luck with the rest of your project. Cheers! – Joseph_J Sep 22 '18 at 03:34
1

As Joseph_J pointed out, the problem is that \1 has to be passed to the regex engine; that is a string of length 2 with the two ASCII characters "\" and "1". But in PHP "\[0-9]{1,3}" (in double quotes) represents only one single character in octal notation. So "\1" would be a string with a length of 1 consisting of one character with ascii value 1. Here's a small overview:

source code     internal string     length
"\1"            ascii code 1        1
'\1'            \1                  2
"\\1"           \1                  2
'\\1'           \1                  2
"\134\61"       \1                  2

If you also want to cover newlines ("containing only a series of the same character"), then a pattern modifier s is needed to make . also match newlines:

preg_replace('/<p>(.)\1*<\/p>/is', '<hr />', $test);
steffen
  • 16,138
  • 4
  • 42
  • 81
  • steffen, removing the \1 will make the pattern also match "

    abcdefg

    ", which is not my goal. Joseph's solution does the trick.
    – dudah84 Sep 22 '18 at 02:53
  • @dudah84 Haha, missed that part :-) I updated my answer accordingly. – steffen Sep 22 '18 at 11:34