3

I have a set of html strings that can look like this:

<div id="myelementID" class="hello" data-foo="bar"> ... </div>

or

<div id="myelementID" class="world" data-this="that"> ... </div>

etc etc, you get the idea. Except for id="myelementID", every other attribute else is not fixed.

What I need is to extract the exact string of the the <div>, eg. <div id="myelementID" class="hello" data-foo="bar"> if an element with the ID "myelementID" exists.

As of now, I'm able to use DomDocument to check if the element exists:

        $dom = new DomDocument;
        $dom->validateOnParse = true;
        $internalErrors = libxml_use_internal_errors(true);
        $dom->loadHTML($html_string);
        libxml_use_internal_errors($internalErrors);
        $el = $dom->getElementById("myelementID");

From here, how can I get the element's HTML string? I'm open to using preg_match as well, which may be an even better solution.

edit Just to be clearer, I'm not looking for the content of the element. I'm looking for the string <div id="myelementID" etc="etc" this="that">. Because it's not certain what attributes the element has apart from the fact that it's ID is "myelementID", that's why I'm having the problem.

BlueSun
  • 3,541
  • 1
  • 18
  • 37
Mike Feng
  • 803
  • 2
  • 9
  • 19

3 Answers3

2

Use DOMNode::C14N method to canonicalize nodes to a string, substr and strpos functions to get the needed fragment :

...
$el = $dom->getElementById("myelementID");
$elString = $el->C14N();

var_dump(substr($elString, 0, strpos($elString, '>') + 1));

The output (for your example):

string(51) "<div class="hello" data-foo="bar" id="myelementID">"

http://php.net/manual/ru/domnode.c14n.php

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

A very simple Regex which works (tested on RegExr). The only downside to this is that any attributes of the <div> which contain a > would cause a premature end of the <div>.

<[^>]*\sid="myelementID"[^>]*>

A breakdown of the RegEx:

  • < the opening tag of <div
  • [^>]* match any number of characters that are not >
  • \s matches a whitespace character (i.e. a space)
  • id="myelementID" matches the id of your target element
  • [^>]* match any number of characters that are not >
  • > the end of the <div> tag
Peter Gordon
  • 1,075
  • 1
  • 18
  • 38
0

You can use bottom code if you want to use DomDocument. In bottom code, i used foreach() to iterate element attributes then store attribute name and attribute value in $elemString variable.

$html_string = '<div id="myelem4entID" class="hello" data-foo="bar">...</div>';

$dom = new DomDocument;
$dom -> loadHTML($html_string);
$el = $dom -> getElementById("myelementID");

if (!empty($el))
{
    $elemString = "<div";
    foreach ($el -> attributes as $attr) 
    {
        $name = $attr -> nodeName;
        $value = $attr -> nodeValue;    
        $elemString .= " {$name}=\"{$value}\"";
    }
    $elemString .= ">";
}

Test it in demo

Mohammad
  • 21,175
  • 15
  • 55
  • 84