0

I'm having a problem matching the href section of a link using preg_match_all, currently it is capturing 3 sections (full link, url only, link text only) which is perfect but the url only part is capturing any other tags located after the href tag.

Also how do I make the "href" text case insensitive?

Code:

$content = '<a href="http://www.google.com" target="_blank">Google</a> is a search engine. <a href="http://www.yahoo.com" title="yahoo" target="_blank">Yahoo</a> is a search engine.';

preg_match_all('/<a href="([^<]*)">([^<]*)<\/a>/', $content, $matches);

print_r($matches);

Result:

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.google.com" target="_blank">Google</a>
            [1] => <a href="http://www.yahoo.com" title="yahoo" target="_blank">Yahoo</a>
        )

    [1] => Array
        (
            [0] => http://www.google.com" target="_blank
            [1] => http://www.yahoo.com" title="yahoo" target="_blank
        )

    [2] => Array
        (
            [0] => Google
            [1] => Yahoo
        )

)
tckmn
  • 57,719
  • 27
  • 114
  • 156
Joe
  • 1,762
  • 9
  • 43
  • 60

1 Answers1

2

your starting out looking for the > and not taking in to account any other attributes. try

/<a href="([^"]*)"[^>]+>([^<]*)<\/a>/

this will now pull out the href, then skip over the rest of the attributes, and then pull the html right up the next tag

bizzehdee
  • 20,289
  • 11
  • 46
  • 76