Parse it with a document object model parser, check the text content of nodes, remove nodes that don't meet your criteria (parses as a script tag, contains whitespace, is an iframe, etc).
Quite a lot of sample code in the comments section as well.
Here's a bunch of code that does something like that (adopted from random cut+paste on php.net)
<?php
$sampleHTML = "
<p> </p>
<p> <p>
<p><br/></p>
<p><br /></p>
<span>Non-empty span<p id='NestedEmptyElement'></p></span>
";
$doc = new DOMDocument();
$doc->loadHTML($sampleHTML);
$domNodeList = $doc->getElementsByTagname('*');
$domElemsToRemove = array();
foreach ( $domNodeList as $domElement ) {
$domElement->normalize();
if (trim($domElement->textContent, "\xc2\xa0 \n \t ") == "") {
$domElemsToRemove[] = $domElement;
}
}
foreach( $domElemsToRemove as $domElement ){
try {
$domElement->parentNode->removeChild($domElement);
} catch (Exception $e) {
//node was already deleted.
//There's a better way to do this, it's recursive.
}
}
$domNodeList = $doc->getElementsByTagname('body')->item(0);
$childNodes = $domNodeList->childNodes;
foreach ( $childNodes as $domElement ) {
echo trim($domElement->C14N());
}
echo "\n\n";
Then we run..
$ php foo.php -v
<span>Non-empty span</span>