1

I want to get HTML attributes from string with PHP but fail with:

$string = '<ul id="value" name="Bob" custom-tag="customData">';
preg_filter("/(\w[-\w]*)=\"(.*?)\"/", '$1', $string ); // returns "<ul id name custom-tag"
preg_filter("/(\w[-\w]*)=\"(.*?)\"/", '$1', $string ); // returns "<ul value Bob customData"

What I want to return is:

array(
  'id' => 'value',
  'name' => 'Bob',
  'custom-tag' => 'customData'
);
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Szymon Toda
  • 4,454
  • 11
  • 43
  • 62

2 Answers2

6

Don't use regexes for parsing HTML

$string = '<ul id="value" name="Bob" custom-tag="customData">';
$dom = new DOMDocument();
@$dom->loadHTML($string);
$ul = $dom->getElementsByTagName('ul')->item(0);
echo $ul->getAttribute("id");
echo $ul->getAttribute("name");
echo $ul->getAttribute("custom-tag");
Community
  • 1
  • 1
John Conde
  • 217,595
  • 99
  • 455
  • 496
  • What if there were more attributes? Also, why are you using error suppression here? – Amal Murali Mar 22 '14 at 14:27
  • This just demonstrates how to get those values. If they want to iterate through all of their attributes it is not difficult to take this a step further. The error suppressor is there just in case their in invalid HTML. It will hide the warning PHP will throw. – John Conde Mar 22 '14 at 14:51
  • Agreed about the first part. But it's a lot better to just use `libxml_use_internal_errors()` to store the current value of error state, clear the error buffers and restore the old error state. The use of `@` is bad, IMO. – Amal Murali Mar 22 '14 at 14:56
4

HTML is not a regular language and cannot be correctly parsed with a regex. Use a DOM parser instead. Here's a solution using PHP's built-in DOMDocument class:

$string = '<ul id="value" name="Bob" custom-tag="customData">';

$dom = new DOMDocument();
$dom->loadHTML($string);

$result = array();

$ul = $dom->getElementsByTagName('ul')->item(0);
if ($ul->hasAttributes()) {
    foreach ($ul->attributes as $attr) {
        $name = $attr->nodeName;
        $value = $attr->nodeValue;    
        $result[$name] = $value;
    }
}

print_r($result);

Output:

Array
(
    [id] => value
    [name] => Bob
    [custom-tag] => customData
)
Amal Murali
  • 75,622
  • 18
  • 128
  • 150