-6

Output (var $DESC)

 <p>erster Absatz</p>
 <p>zweiter Absatz</p>

Regex (PHP)

 preg_replace("<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>{2}", '', $DESC)

I would like to delete only the second p but this regex finds both. Thanks for any help.

Stiller Eugen
  • 681
  • 2
  • 10
  • 28
  • Will it always be a `

    ` block followed immediately by another `

    ` block, or does it need to check for any element that occurs twice in a row?

    – CAustin Sep 27 '17 at 18:47
  • thanks - it will be always like this p-block followed by another p-block – Stiller Eugen Sep 27 '17 at 18:48
  • You really should use an HTML parser for this... – ctwheels Sep 27 '17 at 18:50
  • Also, I didn't vote your question down, but I know why others did. Using regex to parse HTML (and other non-regular languages) is highly discouraged https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – CAustin Sep 27 '17 at 18:50
  • whats a better solution? I got this string through a feed, cannot change anything there and I don't need the second p block.. – Stiller Eugen Sep 27 '17 at 18:54
  • Assuming you can't use an HTML parser (for whatever reason, even though you're using PHP and that's **exactly** how you should approach this), you can use this `

    (.*?)<\/p>` and grab the second result from your `preg_match_all`. But again, I would **highly discourage using regex for this**

    – ctwheels Sep 27 '17 at 18:54

2 Answers2

2

Normally I would just tell you to use an HTML parser instead of regex, but since your requirement is so specific, this can actually be accomplished with regex quite safely.

(?<=<\/p>)\s+<p>[\w ]+<\/p>

https://regex101.com/r/Yqaajy/6

Explanation:

(?<=<\/p>) - Make sure the rest of the pattern is preceded by a <\p> ending tag (positive lookbehind).

\s+ - Any number of whitespace characters. Note that this will not match correctly if you have single line mode enabled.

<p>[\w ]+<\/p> - A paragraph block containing one or more word characters (digits, letters, and underscore) and spaces.

CAustin
  • 4,525
  • 13
  • 25
0

Try this:

$DESC ='<p>erster Absatz</p>
 <p>zweiter Absatz</p>';

$DESC = preg_replace('#\</p\>[^\<]*\<p[^\>]*\>(.*?)\</p\>#i', '</p>', $DESC);
echo $DESC; // <p>erster Absatz</p>
CoursesWeb
  • 4,179
  • 3
  • 21
  • 27