0

I have a string like this:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1">
    This is some text

    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>

<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text

        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>

<blockquote data-id="6">
    This is some text
</blockquote>

I want to keep the outermost blockquote tags, but delete the contents. So I want to convert the above to this:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1"></blockquote>

<blockquote data-id="3"></blockquote>

<blockquote data-id="6"></blockquote>

What is an efficient way to do this in PHP?

Nate
  • 26,164
  • 34
  • 130
  • 214
  • @PaulCrovella Good point, I'll update it since what I really wanted was to completely remove content from the nodes. – Nate Dec 14 '14 at 17:13

2 Answers2

1

Many ways to skin this cat. I'd give the string a dummy root node, ditch all nodes matching the xpath expression /root/blockquote/text() | /root/blockquote/*, then rebuild the string from the root's children.


Example:

$string = <<<'STRING'
<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1">
    This is some text

    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>

<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text

        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>

<blockquote data-id="6">
    This is some text
</blockquote>
STRING;

$dom = new DOMDocument();
$dom->loadXML("<root>$string</root>");
$xpath = new DOMXPath($dom);

foreach ($xpath->query('/root/blockquote/text() | /root/blockquote/*') as $node) {
    $node->parentNode->removeChild($node);
}

$string = '';
foreach ($dom->documentElement->childNodes as $node) {
    $string .= $dom->saveHTML($node);
}

echo $string;

Output:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1"></blockquote>

<blockquote data-id="3"></blockquote>

<blockquote data-id="6"></blockquote>
user3942918
  • 25,539
  • 11
  • 55
  • 67
0

Shortly after posting my question it occurred to me that DomDocument would work well for this problem (although there might be a better solution).

This is what I came up with:

$html = '<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1">
    This is some text

    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>

<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text

        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>

<blockquote data-id="6">
    This is some text
</blockquote>';


libxml_use_internal_errors(true); // MUST INCLUDE THIS LINE!
$dom = new \DOMDocument();
$dom->loadHTML($html); // pass the HTML string

$xpath = new \DOMXPath($dom); // pass the appropriate DomDocument object to the constructor

foreach ($xpath->query('//blockquote') as $node) {
    /** @var \DOMElement $node */
    $node->nodeValue = '';
}

echo domInnerHtml($xpath->query('//body')->item(0));


 /**
 * Returns the inner HTML of a DOMNode
 *
 * @link http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument
 * @param DOMNode $element
 * @return string
 */
function domInnerHtml(DOMNode $element) {
    $innerHtml = '';
    $children  = $element->childNodes;

    foreach ($children as $child) {
        $innerHtml .= $element->ownerDocument->saveHTML($child);
    }

    return $innerHtml;
}

The output is:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1"></blockquote>

<blockquote data-id="3"></blockquote>

<blockquote data-id="6"></blockquote>
Nate
  • 26,164
  • 34
  • 130
  • 214