How do I remove empty
elements?

Question

I have the follow regex:

$html = '<p></p><p>Lorem ispum...</p><p>  </p><p>;nbsp</p>';
$pattern = "/<p[^>]*><\\/p[^>]*>/";
echo preg_replace($pattern, '', $html );

This only removes the  tag if it's actually empty, i.e. . How do I remove it if it has some other invisible copy in it, such as  ?

Try `echo preg_replace('~
(?>\s|(?R))*
~u', '', html_entity_decode($html));` - this should work if you have correct HTML entities. If not, you will need to replace them "manually" (with a list, perhaps). — Wiktor Stribiżew, Jan 27 '16 at 08:20

score 0 · Answer 1 · edited May 23 '17 at 11:52

There are several possible kinds of whitespace and even more possibilities for "empty" (e.g., is  empty? Or not?).

Also consider the possibility of having  or .

Much depends on where the text comes from. Microsoft Word will output  's in some circumstances (I could and did unremember them -- sorry).

A reasonable possibility for now might be to use a regex such as #(\\s| )*#mis' to match multiple empty lines.

But keep in mind that this kind of requisite tends to rapidly become unreasonable - for example the class part might force you to use #<p[^>]*>(\\s| )*#mis' and so on - so, you might want to start looking into a XML parser instead.

score 0 · Answer 2 · answered Jan 27 '16 at 08:24

0

I assume by backspace, you mean whitespace, and that ;nbsp& should be   and propose:

$pattern = "/<p[^>]*>(\s|&nbsp;)*<\\/p[^>]*>/";

\s mathes any whitespace character

The pattern mathes \s OR (|)   ANY (*) number of times inside the  tags.

answered Jan 27 '16 at 08:24

Richard Tingstad

346
2
7

Note that
will not be removed in this case. – Richard Tingstad Jan 27 '16 at 08:25

How do I remove empty elements?

2 Answers2

How do I remove empty
elements?