0

I'm working on a regular expression pattern to extract tag and attributes from an html element. But I have some problems with matching the attributes :s. Only the last attribute is stored into the matches array.

Here is the code:

<?php
    $subject = '<font face="arial" size="1" color="red">hello world!</font>';
    $find= '/<(?P<tag>\w+)\s+((?P<attr>\w+)=(?P<value>[^\s""\'>]+|"[^"]*"|\'[^\']*\')\s*)*\/?>/si';

    preg_match_all( $find, $subject, $matches );
?>

Can someone help me out?

Many thanks

Maarten
  • 39
  • 3
  • Drop that and use [XPath](http://www.w3schools.com/XPath/default.asp) instead. – Welbog Jul 12 '10 at 15:55
  • You can't reliably parse HTML with regular expressions. See the awesome rant on this subject here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – sbleon Jul 12 '10 at 15:57
  • But what if I want to parse html to xhtml? I read that xpath is xhtml compatible. – Maarten Jul 12 '10 at 16:11

1 Answers1

1

Some important points:

  • You shouldn't use regex to parse HTML. PHP has many excellent HTML parsing libraries.
  • A group that captures repeatedly in a match only keeps the last capture.
    • One notable exception is .NET regex

References

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623