0

I have the following string:

<span style="font-size: 13px;">
   <span style="">
      <span style="">
         <span style="font-family: Roboto, sans-serif;">
            <span style="">
               Some text content
            </span>
         </span>
      </span>
   </span>
</span>

and I want to change this string to the following using PHP:

<span style="font-size: 13px;">
   <span style="font-family: Roboto, sans-serif;">
      Some text content
   </span>
</span>

I dont have any idea, how to do that, because when I try to use str_replace to replace the <span style=""> I dont know, how to replace the </span> and keep the content inside. My next problem is, that I dont know exactly, how much <span style=""> I have in my string. I also have not only 1 of this blocks in my string.

Thanks in advance for your help, and maybe sorry for my stupid question - I'm still learning.

Sherif
  • 11,786
  • 3
  • 32
  • 57
noten40565
  • 11
  • 2
  • 2
    Don't use regular expressions to parse HTML. Use `DOMDocument`. – Barmar Feb 15 '20 at 00:10
  • Does this answer your question? [How do you parse and process HTML/XML in PHP?](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – user3783243 Feb 15 '20 at 00:12
  • @user3783243 I'll take a look at the answer and let you know, thanks for your response :) – noten40565 Feb 15 '20 at 00:16
  • Hi, and welcome to StackOverflow. I've provided you with a detailed answer on how to approach your problem below. Feel free to update your question if this isn't what you're looking for. – Sherif Feb 15 '20 at 02:22

2 Answers2

0

This is easily done with a proper HTML parser. PHP has DOMDocument which can parse X/HTML into the Document Object Model which can then be manipulated how you want.

The trick to solving this problem is being able to recursively traverse the DOM tree, seeking out each node, and replacing the ones you don't want. To this I've written a short helper method by extending DOMDocument here...

$html = <<<'HTML'
<span style="font-size: 13px;">
   <span style="">
      <span style="">
         <span style="font-family: Roboto, sans-serif;">
            <span style="">
               Some text content
            </span>
         </span>
      </span>
   </span>
</span>
HTML;

class MyDOMDocument extends DOMDocument {
    public function walk(DOMNode $node, $skipParent = false) {
        if (!$skipParent) {
            yield $node;
        }
        if ($node->hasChildNodes()) {
            foreach ($node->childNodes as $n) {
                yield from $this->walk($n);
            }
        }
    }
}

libxml_use_internal_errors(true);

$dom = new MyDOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$keep = $remove = [];

foreach ($dom->walk($dom->childNodes->item(0)) as $node) {
    if ($node->nodeName !== "span") { // we only care about span nodes
        continue;
    }
    // we'll get rid of all span nodes that don't have the style attribute
    if (!$node->hasAttribute("style") || !strlen($node->getAttribute("style"))) {
        $remove[] = $node;
        foreach($node->childNodes as $child) {
            $keep[] = [$child, $node];
        }
    }
}

// you have to modify them one by one in reverse order to keep the inner nodes
foreach($keep as [$a, $b]) {
    $b->parentNode->insertBefore($a, $b);
}
foreach($remove as $a) {
    if ($a->parentNode) {
        $a->parentNode->removeChild($a);
    }
}

// Now we should have a rebuilt DOM tree with what we expect:
echo $dom->saveHTML();

Output:

<span style="font-size: 13px;">


         <span style="font-family: Roboto, sans-serif;">

               Some text content

         </span>


</span>
Sherif
  • 11,786
  • 3
  • 32
  • 57
0

For a more general way to modify HTML document, take a look at XSLT (Extensible Stylesheet Language Transformations). PHP has a XSLT library.

You then have an XML document with your transform rules in place:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html" indent="yes"/>

    <!-- remove spans with empty styles -->
    <xsl:template match="*[@style and string-length(./@style) = 0]">
        <xsl:apply-templates />
    </xsl:template>

    <!-- catch all to copy any elements that aren't matched in other templates -->
    <xsl:template match="*">
        <xsl:copy select=".">
            <!-- copy the attributes of the element -->
            <xsl:copy-of select="@*" />
            <!-- continue applying templates to this element's children -->
            <xsl:apply-templates select="*" />
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Then your PHP:

$sourceHtml = new DOMDocument();
$sourceHtml->load('source.html');

$xsl = new DOMDocument();
$xsl->load('transform.xsl');

$xsltProcessor = new XSLTProcessor;
$xsltProcessor->importStyleSheet($xsl); // attach the xsl rules

echo $xsltProcessor->transformToXML($sourceHtml);

$transformedHtml = $xsltProcessor->transformToDoc($sourceHtml);
$transformedHtml->saveHTMLFile('transformed.html');

XSLT is superpowerful for this kind of thing, and you can set all sorts of rules for parent/sibling relationships, and modify attributes and content accordingly.

HorusKol
  • 8,375
  • 10
  • 51
  • 92