12

I've got a string with HTML attributes:

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

How to transform that string into an indexed array, like:

array(
  'id' => 'header',
  'class' => array('foo', 'bar'),
  'style' => array(
    'background-color' => '#fff',
    'color' => 'red'
  )
)

so I can use the PHP array_merge_recursive function to merge 2 sets of HTML attributes.

Thank you

abernier
  • 27,030
  • 20
  • 83
  • 114

6 Answers6

24

Use SimpleXML:

<?php
$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

$x = new SimpleXMLElement("<element $attribs />");

print_r($x);

?>

This assumes that the attributes are always name/value pairs...

Ken Keenan
  • 9,818
  • 5
  • 32
  • 49
8

You could use a regular expression to extract that information:

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';
$pattern = '/(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/';
preg_match_all($pattern, $attribs, $matches, PREG_SET_ORDER);
$attrs = array();
foreach ($matches as $match) {
    if (($match[2][0] == '"' || $match[2][0] == "'") && $match[2][0] == $match[2][strlen($match[2])-1]) {
        $match[2] = substr($match[2], 1, -1);
    }
    $name = strtolower($match[1]);
    $value = html_entity_decode($match[2]);
    switch ($name) {
    case 'class':
        $attrs[$name] = preg_split('/\s+/', trim($value));
        break;
    case 'style':
        // parse CSS property declarations
        break;
    default:
        $attrs[$name] = $value;
    }
}
var_dump($attrs);

Now you just need to parse the classes of class (split at whitespaces) and property declarations of style (a little bit harder as it can contain comments and URLs with ; in it).

Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 1
    Thank you Gumbo, your regex is cool. The only problem is $attrs['class'] or $attrs['style'] are returning strings: so it will be difficult to merge them with another $attribs string, for example merging that 2 sets of attribs: $attribs1 = 'class="foo bar"'; $attribs2 = 'class="lorem"'; into a 'class="foo bar lorem"' That's why I would like $attrs['class'] returns an array: array('foo', 'bar') Do you have an idea to enhance this? – abernier Jul 05 '09 at 10:36
  • 1
    I've just written an alternative regex which also parses HTML5 style boolean attributes (without an = sign) and uses a back reference for the quotes: `(\w+)\s*(=\s*(["'])(.*?)\2\s)?` – Angry Dan Nov 11 '12 at 22:45
7

Easy way could be also:

$atts_array = current((array) new SimpleXMLElement("<element $attribs />"));
Mariyo
  • 486
  • 7
  • 15
5

You can't use a regular expression to parse html-attributes. This is because the syntax is contextual. You can use regular expressions to tokenize the input, but you need a state machine to parse it.

If the performance isn't a big deal, the safest way to do it, is probably to wrap the attributes in a tag and then send it through an html parser. Eg.:

function parse_attributes($input) {
  $dom = new DomDocument();
  $dom->loadHtml("<foo " . $input. "/>");
  $attributes = array();
  foreach ($dom->documentElement->attributes as $name => $attr) {
    $attributes[$name] = $node->value;
  }
  return $attributes;
}

You could probably optimize the above, by reusing the parser, or by using XmlReader or the sax parser.

troelskn
  • 115,121
  • 27
  • 131
  • 155
  • 1
    Parse this: foo='bar' cuux="O'Reiley" zip="\"zap\"" – troelskn Jul 05 '09 at 10:53
  • 1
    @troelskn: The third attribute value declaration is invalid. The `"` need to be represented by character references. – Gumbo Jul 05 '09 at 11:01
  • 1
    You're right - I wasn't aware of that. I would still suggest using an xml/html parser, to account for all sorts of odd edge cases. – troelskn Jul 05 '09 at 13:38
3

May be this helps you .. What it does ..

  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

http://simplehtmldom.sourceforge.net/

TigerTiger
  • 10,590
  • 15
  • 57
  • 72
  • Note that the one reason I ended up here is because the DOMProcessingInstruction has a `data` field which is the text wihin the ``. In case of a tag such as: `` you get a plain string like: `type="text/xsl" href="https://sms.m2osw.com/sitemap.xsl"` which you need to parse as attributes. – Alexis Wilke Aug 27 '17 at 19:45
2

A simple and effective function to solve this

function attrString2Array($attr) {
  $atList = [];

  if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
    for ($i = 0; $i < count($m[0]); $i++) {
      if ($m[3][$i])
        $atList[$m[3][$i]] = null;
      else
        $atList[$m[1][$i]] = $m[2][$i];
    }
  }

  return $atList;
}

print_r(attrString2Array('<li data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif">'));
print_r(attrString2Array('data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif"'));

//Array
//(
//    [data-tpl-classname] => class
//    [data-tpl-title] => innerHTML
//    [disabled] => 
//    [nowrap] => 
//    [href] => #
//    [hide] => 
//    [src] => images/asas.gif
//)
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • 1
    Welcome to StackOverflow! Please edit your answer to provide an explanation of your code. This will improve your answer's quality and make it more likely for it to get upvoted :) – Das_Geek Sep 27 '19 at 14:27
  • Did you notice that the OP's question is seeking a multi-dimensional result? – mickmackusa Mar 30 '21 at 04:00