0

Possible Duplicate:
Grabbing the href attribute of an A element

I've gone through a lot of other posts and saw that all of them follow some fixed Anchor tag format. Most of them assume that anchor tag format to be <a href="http://www.example.com/">Hello</a> .. and maybe a target property after <a .. But I am trying to write a regex to match the href of an anchor tag, wherever it might appear in the tag. It can appear after alt, title, target or maybe between them. and there's another case that the anchor tag uses single quote instead of double quotes.
I've trying this for half an hour and not getting any result. So posting it here.

Community
  • 1
  • 1
Bibhas Debnath
  • 14,559
  • 17
  • 68
  • 96

1 Answers1

5

Don't parse HTML with regex use a library like DOMDocument or Simple HTML DOM Parser.

fire
  • 21,383
  • 17
  • 79
  • 114
  • Already thought that. But had a doubt. Using a library just for this, wont that increase processing time? I'm talking about thousands of anchor tags parsed per minute. So just wanted to be sure. – Bibhas Debnath Aug 03 '11 at 13:25
  • Or is it that fact that there are so many to process, its better to use a library? – Bibhas Debnath Aug 03 '11 at 13:26
  • 1
    @Bibhas DOM is a native extension. And you shouldnt worry about performance without profiling and finding it has a significant negative impact. Also see http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662 – Gordon Aug 03 '11 at 13:27
  • On a sidenote: as much as I second the suggestion to use DOM (not SimpleHtmlDom; it sucks), I think this answer should have been a comment. It has become a trend lately to provide this answer to the never ending "parse regex with dom questions". But the answer is so generic that it shouldnt garner any reputation for whoever supplied it imo; especially since the question is a duplicate as well. – Gordon Aug 03 '11 at 14:59
  • SOF needs some way to highlight most useful comments as well. Most of the time the answers lie in comments. – Bibhas Debnath Aug 04 '11 at 12:08