0
$pattern='`<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)</a>`isU';

And I want to change ([^<]*) this to search for </a> not only < cause <img> tag could be inside <a> tag.

Can anyone help, I'm lousy at regex.

dfilkovi
  • 3,051
  • 7
  • 39
  • 51
  • 5
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – dynamic Jun 13 '11 at 16:19

3 Answers3

2

You can use a PHP parser to do this. I wouldn't use Regex at all.

You can try: http://simplehtmldom.sourceforge.net/

Although I think PHP has a DOM parser built in.

Francis Gilbert
  • 3,382
  • 2
  • 22
  • 27
1

Changing ([^<]*)to a ungreedy match all (.*?) might do the trick

Mick Hansen
  • 2,685
  • 18
  • 14
0

([^<]*) could be changed to ((?:[^<]|<(?!/a>))*), which uses a negative lookahead to match non-< characters or < characters which are not followed by /a>. See it in action here.

HOWEVER, as stated many times over already, this is not a good way to parse HTML. Firstly, it's horribly inefficient, and secondly, what happens if you have nested tags, such as <a><a></a></a>? While this may not happen with hyperlinks, it's common among many other HTML elements.

dlras2
  • 8,416
  • 7
  • 51
  • 90