PHP Regex to remove last paragraph (having attributes) and contents

Question

My question is similar to this question asked on Stackoverflow. But there is a difference.

I have the following stored in a MySQL table:

<p align="justify">First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
<div class="item">
<p>Some paragraph here</p>
<p><strong><u>Specs</u>:</strong><br /><br /><strong>Weight:</strong> 10kg<br /><br /><strong>LxWxH:</strong> 5mx1mx40cm</p
<p align="justify">second last para</p>
<p align="justify">This is the paragraph I am trying to remove with regex.</p>
</div>

I'm trying to remove the last paragraph tags and content on every row in the table. The best answer mentioned in the linked question suggests following regex -

preg_replace('~(.*)<p>.*?</p>~', '$1', $html)

The difference from linked question is - Sometimes my last paragraph tag may (or may not) have attributes align="justify". If the last last paragraph has this attribute, then mentioned solution removes the last paragraph of the content which does not have attributes. So, I am struggling to find a way to remove last paragraph, irrespective of its attributes status.

Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Lucas Trzesniewski, Jan 02 '16 at 13:57
@LucasTrzesniewski Thanks for the link. Although, I didn't understand it completely, I have bookmarked it. — Dr. Atul Tiwari, Jan 02 '16 at 14:23
The link basically says you should use the right tool for the job. You need a HTML parser/DOM manipulation library here. Using regular expressions is brittle - you can do much better and easier with DOM (or XPath, or CSS selectors). — Lucas Trzesniewski, Jan 02 '16 at 14:29
@LucasTrzesniewski Thanks for simplification. I will read about HTML parser/DOM manipulation. — Dr. Atul Tiwari, Jan 02 '16 at 14:34

Giuseppe Ricupero · Accepted Answer · 2016-01-02T14:21:30.673

2

Change the regex to:

preg_replace('~(.*)<p[^>]*>.*</p>\R?~s', '$1', $html)

Regex101 Demo

Regex Breakout

~           # Opening regex delimiter
  (.*)      # Select any chars matching till the last '<p>' tags
            # (actually it matches till the end then backtrack)
  <p[^>]*>  # select a '<p>' tag with any content inside '<p .... >'
            # the content chars after '<p' must not be the literal '>'
  .*        # select any char till the '</p>' closing tag
  </p>      # matches literal '</p>'
  \R?       # select (to remove it) any newline (\r\n, \r, \n)
~s          # Closing regex delimiter with 's' DOTALL flag 
            # (with 's' the '.' matches also newlines)

edited Jan 02 '16 at 14:21

answered Jan 02 '16 at 14:04

Giuseppe Ricupero

6,134
3
23
32

Thanks. It worked. I think you need to edit the answer and remove this text from regex => `**strong text**` – Dr. Atul Tiwari Jan 02 '16 at 14:20
@Dr.AtulTiwari: thanks, strangely it happens when i paste something! – Giuseppe Ricupero Jan 02 '16 at 14:22

PHP Regex to remove last paragraph (having attributes) and contents

1 Answers1