0

I want to postfix every link in a HTML file with a Google Analytics tracking code. The entire HTML is contained in the $content variabile. Is it possible to add this tracking code to all links, except mailto?

Psyche
  • 8,513
  • 20
  • 70
  • 85
  • 1
    Another "The Pony He Comes" question... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – tdammers Sep 17 '12 at 14:12

1 Answers1

1

No, you can't do it - at least not reliably so. HTML is very contextual, which means you need a real parser to pull this off. Regular expressions may cover many cases, but you'll end up with both false positives (your regex matching on something that is not really a link) and false negatives (real links being missed). See the link in my Pony comment for a more thorough... uhm... "explanation".

If you really have to go through the final HTML and post-process it, your best bet is to find a proper HTML parser (in a pinch, DOMDocument might do: IIRC, it can parse both XML and HTML), walk through the DOM tree and replace links as appropriate, then render the tree back into a string.

Ideally though, you have an HTML-aware template system in place (e.g. XSLT), in which case you can probably intercept the DOM tree earlier in the process, which means you can skip the additional parsing and rendering steps and go right to the DOM tree.

tdammers
  • 20,353
  • 1
  • 39
  • 56