3

I'm trying to use masterminds/html5-php in order to manipulate some html5 documents. I don't know whether the document is a full HTML page or some part of html code so I'm trying to manipulate it like this.

        $html5 = new HTML5();
        $dom = $html5->loadHTML($html);

        if($html5->hasErrors())
            return $html;


        $domObject = new Zend_Dom_Query();
        $domObject->setDocument($dom);

        if (empty($selector))
            return $html;


        $domElements = $domObject->query($selector);
        if ($domElements->count() > 0) {
            foreach ($domElements as $domElement) {
                $domElement->parentNode->removeChild($domElement);
            }
        }

        $result = $html5->saveHTML($dom);
        return $result;

My problem is that saveHTML() will wrap the html code in so my final document ends up being full of tags all over the place.

Is there a way to alter this behaviour? Perhaps by overriding the saveHTML method? Any hints are appreciated.

here is the original HTML

<div class="language-currency-wrapper ">
    <div class="language-currency-block">
                            <span class="language">
                                        <img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/English.jpg" alt="English" />
                    <span class="io-language-label hidden-xs">English</span>
                                    </span>
        <i class="icon-down-open-mini"></i>
    </div>
    <div class="language-currency-dropdown">
        <div class="form-language list">
            <div class="label">Your Language:</div>
            <a href="http://domain.gr/products/bmw/f32.html?___store=en&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/English.jpg" alt="English" /><span class="hidden-xs">English</span></a>
            <a href="http://domain.gr/products/bmw/f32.html?___store=de&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/German.jpg" alt="German" /><span class="hidden-xs">German</span></a>
            <a href="http://domain.gr/products/bmw/f32.html?___store=el&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/Greek.jpg" alt="Greek" /><span class="hidden-xs">Greek</span></a>
        </div>
    </div>
    <div class="clearfix"></div>
</div>

Here is what saveHTML produces

<!DOCTYPE html>
<html><div class="language-currency-wrapper ">
    <div class="language-currency-dropdown">
        <div class="form-language list">
            <div class="label">Your Language:</div>
            <a href="http://domain.gr/products/bmw/f32.html?___store=en&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/English.jpg" alt="English"><span class="hidden-xs">English</span></a>
            <a href="http://domain.gr/products/bmw/f32.html?___store=de&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/German.jpg" alt="German"><span class="hidden-xs">German</span></a>
            <a href="http://domain.gr/products/bmw/f32.html?___store=el&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/Greek.jpg" alt="Greek"><span class="hidden-xs">Greek</span></a>
        </div>
    </div>
    <div class="clearfix"></div>
</div>
</html>

Here is what I was expecting

<div class="language-currency-wrapper ">
    <div class="language-currency-dropdown">
        <div class="form-language list">
            <div class="label">Your Language:</div>
            <a href="http://domain.gr/products/bmw/f32.html?___store=en&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/English.jpg" alt="English"><span class="hidden-xs">English</span></a>
            <a href="http://domain.gr/products/bmw/f32.html?___store=de&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/German.jpg" alt="German"><span class="hidden-xs">German</span></a>
            <a href="http://domain.gr/products/bmw/f32.html?___store=el&amp;___from_store=en"><img src="http://domain.gr/skin/frontend/be/ioweb/images/lang/Greek.jpg" alt="Greek"><span class="hidden-xs">Greek</span></a>
        </div>
    </div>
    <div class="clearfix"></div>
</div>

For simplicity I'm removing the language block

gabtzi
  • 573
  • 3
  • 8
  • 24
  • Can you share you `DOM HTML String` and expected output? – Sahil Gulati Jun 23 '17 at 08:57
  • Just updated my question with example html code input/output and expected output – gabtzi Jun 23 '17 at 09:10
  • If i give you expected result with `DOMDocument` will that be okay? Do you only want to remove `language-currency-block` div? – Sahil Gulati Jun 23 '17 at 09:13
  • I've tried DOMDocument before with the libxml options and it wouldn't add the or doctype tags however it was giving me a million other issues including and not restricted to, breaking encoding from utf-8 even after converting to html entities, removing – gabtzi Jun 23 '17 at 09:18
  • Possible duplicate of [How to saveHTML of DOMDocument without HTML wrapper?](https://stackoverflow.com/questions/4879946/how-to-savehtml-of-domdocument-without-html-wrapper) – miken32 May 16 '19 at 03:31
  • The main difference in my question was about using a specific php library to achieve this so it's not really a duplicate. – gabtzi May 17 '19 at 04:17

0 Answers0