11

I can't quite figure it out, I'm looking for some code that will add an attribute to an HTML element.

For example lets say I have a string with an <a> in it, and that <a> needs an attribute added to it, so <a> gets added style="xxxx:yyyy;". How would you go about doing this?

Ideally it would add any attribute to any tag.

miken32
  • 42,008
  • 16
  • 111
  • 154
CafeHey
  • 5,699
  • 19
  • 82
  • 145
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Mike Axiak Oct 20 '10 at 23:17
  • I actually wrote a php function to do that... wanted to search for all hyperlinks in a block of text, and created a target='blank' attribute, or changed the existing one to be target='blank'. It was a pretty complex process, regex matching was just a small part. – Sam Dufel Oct 20 '10 at 23:21

2 Answers2

21

It's been said a million times. Don't use regex's for HTML parsing.

    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $x = new DOMXPath($dom);

    foreach($x->query("//a") as $node)
    {   
        $node->setAttribute("style","xxxx");
    }
    $newHtml = $dom->saveHtml()
Byron Whitlock
  • 52,691
  • 28
  • 123
  • 168
  • 4
    How would you prevent `DOMDocument()` adding `` wrapper around the given tag? – Sisir Feb 10 '14 at 05:11
  • 1
    Use another language. Joke aside, since DOMDocument is essentially crap you have to do some str_replace on the document to remove anything it added – Ms01 Sep 29 '14 at 07:32
  • You can use `$node->c14n()` to get the canonical HTML for the node. It won't wrap in `` tags. – Byron Whitlock Oct 16 '14 at 18:32
  • There are some quite good reasons to use Regex to parse HTML... Though in this case you are right, the OP should not aim for regex but for some other solution. – Philipp May 02 '18 at 10:35
10

Here is using regex:

  $result = preg_replace('/(<a\b[^><]*)>/i', '$1 style="xxxx:yyyy;">', $str);

but Regex cannot parse malformed HTML documents.

Vantomex
  • 2,247
  • 5
  • 20
  • 22
  • Why do you need the `<` in `[^><]`? – Déjà vu Oct 21 '10 at 02:36
  • To prevent accident possibility, that is, the regex will not process `a` tags which doesn't properly closed, for example: `yyy`; in this case, if I didn't include the `<`, the regex will consider the scope of `a` tag is ``. – Vantomex Oct 21 '10 at 03:05
  • ...or if someone decides to include `<` as a literal character inside an attribute. Better fail the match then instead of messing up the HTML even further. – Tim Pietzcker Oct 21 '10 at 06:34