I'm trying to remove all <br>
before my text.
So I have this:
<p>
<br/><br/>When the battle is on between contestants in a talent show, it gets really competitive when down to the last four. X-FactorUSAcontestant Marcus Canty knows this all too well as this is the stage he was voted off of the show earlier this year. <br/><br/>
</p>
I want to get rid of the first two <br/>
but also I'd want to get rid of them if there were more than 2.
I would prefer to sue xpath as I'm already using it, at the moment I have this.
foreach($xpath->query('//br[not(preceding::text())]') as $node) {
$node->parentNode->removeChild($node);
}
For some reason on this particular page it doesn't seem to be working.
UPDATE
Originally the question was why was there
at the start of document when my xpath should be getting rid of them (see below). I applied some regex to see if that worked which revealed the doctype you see now. I thought the doctype was somehow causing my original problem but it just wasn't being shown until now. This content is what I've imported from blogger and currently manipulating to fit a new blog.
!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN” “http://www.w3.org/TR/REC-html40/loose.dtd”><br><br>
Here's my code:
global $post;
$postTime = $post - > post_date;
$postTime = strtotime($postTime);
$startDate = "2014/01/16";
if ($postTime < strtotime($startDate)) {
$html = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$doc = new DOMDocument();@$doc - > loadHTML($html);
$xpath = new DOMXPath($doc);
foreach($xpath - > query('//br[not(preceding::text())]') as $node) {
$node - > parentNode - > removeChild($node);
}
$nodes = $xpath - > query('//a[string-length(.) = 0]');
foreach($nodes as $node) {
$node - > parentNode - > removeChild($node);
}
$nodes = $xpath - > query('//*[not(text() or node() or self::br)]');
foreach($nodes as $node) {
$node - > parentNode - > removeChild($node);
}
remove_filter('the_content', 'wpautop');
$content = $doc - > saveHTML();
$content = ltrim($content, '<br>');
$content = strip_tags($content, '<br> <a> <iframe>');
$content = preg_replace(array('/(<br\s*\/?>\s*){1,}/'), array('<br/><br/>'), $content);
$content = str_replace(' ', ' ', $content);
$content = "<p>".implode("</p>\n\n<p>", preg_split('/\n(?:\s*\n)+/', $content))."</p>";
return $content;
Help appreciated.
]*/> actually states that whatever content between `
` is ignored... – Willem Van Onsem Jan 21 '14 at 14:09
` at the start of document when my xpath should be getting rid of them (see above). I applied some regex to see if that worked which revealed the doctype you see now. I thought the doctype was somehow causing my original problem but it just wasn't being shown until now. This content is what I've imported from blogger and currently manipulating to fit a new blog. – UzumakiDev Jan 21 '14 at 14:48