5

How can I remove <br/> if no text comes before or after it?

For instance,

<p><br/>hello</p>
<p>hello<br/></p>

they should be rewritten like this,

<p>hello</p>
<p>hello</p>

Should I use DOMxpath or regex would be better?

(Note: I have a post about removing <p><br/></p> with DOMxpath earlier, and then I came across this issue!)

EDIT:

If I have this in the input,

$content = '<p><br/>hello<br/>hello<br/></p>';

then it should be

<p>hello<br/>hello</p>'
Community
  • 1
  • 1
Run
  • 54,938
  • 169
  • 450
  • 748

2 Answers2

4

To select the mentioned br you can use:

 "//p[node()[1][self::br]]/br[1] | //p[node()[last()][self::br]]/br[last()]"

or, (maybe) faster:

 "//p[br]/node()[self::br and (position()=1 or position()=last())]"

Just getting the br when the first (or last) node of p is br.

This will select br such as:

<p><br/>hello</p>
<p>hello<br/></p>

and first and last br like in:

<p><br/>hello<br/>hello<br/></p>

not middle br like in:

<p>hello<br/>hello</p>

PS: to get eventually the first br in a pair like this <br/><br/>:

"//br[following::node()[1][self::br]]"
Emiliano Poggi
  • 24,390
  • 8
  • 55
  • 67
0

In case for some code, I could get it to working like this (Demo). It has a slight modification from @empo's xpath (very slightly) and shows the removal of the matches as well as some more test-cases:

$html = <<<EOD
<p><br/>hello</p>
<p>hello<br/></p>
<p>hello<br/>Chello</p>
<p>hello <i>molly</i><br/></p>
<p>okidoki</p>
EOD;

$doc = new DomDocument;
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodes = $xpath->query('//p[node()[1][self::br] or node()[last()][self::br]]/br');
foreach($nodes as $node) {
    $node->parentNode->removeChild($node);
}
var_dump($doc->saveHTML());
hakre
  • 193,403
  • 52
  • 435
  • 836