regex to replace contend of second
-tag

Question

Output (var $DESC)

 <p>erster Absatz</p>
 <p>zweiter Absatz</p>

Regex (PHP)

 preg_replace("<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>{2}", '', $DESC)

I would like to delete only the second p but this regex finds both. Thanks for any help.

Will it always be a `
` block followed immediately by another `
` block, or does it need to check for any element that occurs twice in a row? — CAustin, Sep 27 '17 at 18:47
thanks - it will be always like this p-block followed by another p-block — Stiller Eugen, Sep 27 '17 at 18:48
Also, I didn't vote your question down, but I know why others did. Using regex to parse HTML (and other non-regular languages) is highly discouraged https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — CAustin, Sep 27 '17 at 18:50
whats a better solution? I got this string through a feed, cannot change anything there and I don't need the second p block.. — Stiller Eugen, Sep 27 '17 at 18:54
Assuming you can't use an HTML parser (for whatever reason, even though you're using PHP and that's **exactly** how you should approach this), you can use this `
(.*?)<\/p>` and grab the second result from your `preg_match_all`. But again, I would **highly discourage using regex for this** — ctwheels, Sep 27 '17 at 18:54

score 2 · Answer 1 · answered Sep 27 '17 at 18:55

Normally I would just tell you to use an HTML parser instead of regex, but since your requirement is so specific, this can actually be accomplished with regex quite safely.

(?<=<\/p>)\s+<p>[\w ]+<\/p>

https://regex101.com/r/Yqaajy/6

Explanation:

(?<=<\/p>) - Make sure the rest of the pattern is preceded by a <\p> ending tag (positive lookbehind).

\s+ - Any number of whitespace characters. Note that this will not match correctly if you have single line mode enabled.

<p>[\w ]+<\/p> - A paragraph block containing one or more word characters (digits, letters, and underscore) and spaces.

score 0 · Answer 2 · answered Sep 27 '17 at 18:58

0

Try this:

$DESC ='<p>erster Absatz</p>
 <p>zweiter Absatz</p>';

$DESC = preg_replace('#\</p\>[^\<]*\<p[^\>]*\>(.*?)\</p\>#i', '</p>', $DESC);
echo $DESC; // <p>erster Absatz</p>

answered Sep 27 '17 at 18:58

CoursesWeb

4,179
3
21
27

regex to replace contend of second -tag

2 Answers2

regex to replace contend of second
-tag