Get DOM element string using PHP

Question

I have a set of html strings that can look like this:

<div id="myelementID" class="hello" data-foo="bar"> ... </div>

or

<div id="myelementID" class="world" data-this="that"> ... </div>

etc etc, you get the idea. Except for id="myelementID", every other attribute else is not fixed.

What I need is to extract the exact string of the the <div>, eg. <div id="myelementID" class="hello" data-foo="bar"> if an element with the ID "myelementID" exists.

As of now, I'm able to use DomDocument to check if the element exists:

        $dom = new DomDocument;
        $dom->validateOnParse = true;
        $internalErrors = libxml_use_internal_errors(true);
        $dom->loadHTML($html_string);
        libxml_use_internal_errors($internalErrors);
        $el = $dom->getElementById("myelementID");

From here, how can I get the element's HTML string? I'm open to using preg_match as well, which may be an even better solution.

edit Just to be clearer, I'm not looking for the content of the element. I'm looking for the string <div id="myelementID" etc="etc" this="that">. Because it's not certain what attributes the element has apart from the fact that it's ID is "myelementID", that's why I'm having the problem.

Possible duplicate of [How to get html code of DOMElement node?](http://stackoverflow.com/questions/12909787/how-to-get-html-code-of-domelement-node) — u_mulder, Jun 18 '16 at 11:40
I've read that thread already. It's not a duplicate, it's about a different issue, and there's no valid answer. — Mike Feng, Jun 18 '16 at 11:42
I think you can use regexp in that case yes. Something like `if(preg_match("#
]*>(.*)<\/div>"#,$el,$match) > 0)` would be sufficient, with $match[1] having your content — jquiaios, Jun 18 '16 at 11:42
So why it's about different issue? Have you tried that `saveHTML`? What it outputs and what you need? — u_mulder, Jun 18 '16 at 11:49
Not sure if it's because the question wasn't clear, but I'm not looking for the content of the element, but the string `
` itself. — Mike Feng, Jun 18 '16 at 11:50

RomanPerekhrest · Accepted Answer · 2016-06-18T11:58:45.663

2

Use DOMNode::C14N method to canonicalize nodes to a string, substr and strpos functions to get the needed fragment :

...
$el = $dom->getElementById("myelementID");
$elString = $el->C14N();

var_dump(substr($elString, 0, strpos($elString, '>') + 1));

The output (for your example):

string(51) "<div class="hello" data-foo="bar" id="myelementID">"

http://php.net/manual/ru/domnode.c14n.php

edited Jun 18 '16 at 11:58

answered Jun 18 '16 at 11:50

RomanPerekhrest

88,541
4
65
105

No no, you're reading the question wrong. I don't want the content of the element. I just want `
`
– Mike Feng Jun 18 '16 at 11:53

score 1 · Answer 2 · answered Jun 18 '16 at 12:09

A very simple Regex which works (tested on RegExr). The only downside to this is that any attributes of the <div> which contain a > would cause a premature end of the <div>.

<[^>]*\sid="myelementID"[^>]*>

A breakdown of the RegEx:

< the opening tag of <div
[^>]* match any number of characters that are not >
\s matches a whitespace character (i.e. a space)
id="myelementID" matches the id of your target element
[^>]* match any number of characters that are not >
> the end of the <div> tag

Looks like @RomanPerekhrest got there first, but I'd already started so thought I might as well finish! — Peter Gordon, Jun 18 '16 at 12:10

score 0 · Answer 3 · answered Jun 18 '16 at 12:45

You can use bottom code if you want to use DomDocument. In bottom code, i used foreach() to iterate element attributes then store attribute name and attribute value in $elemString variable.

$html_string = '<div id="myelem4entID" class="hello" data-foo="bar">...</div>';

$dom = new DomDocument;
$dom -> loadHTML($html_string);
$el = $dom -> getElementById("myelementID");

if (!empty($el))
{
    $elemString = "<div";
    foreach ($el -> attributes as $attr) 
    {
        $name = $attr -> nodeName;
        $value = $attr -> nodeValue;    
        $elemString .= " {$name}=\"{$value}\"";
    }
    $elemString .= ">";
}

Test it in demo

Get DOM element string using PHP

3 Answers3