preg_match_all not matching href section correctly

Question

I'm having a problem matching the href section of a link using preg_match_all, currently it is capturing 3 sections (full link, url only, link text only) which is perfect but the url only part is capturing any other tags located after the href tag.

Also how do I make the "href" text case insensitive?

Code:

$content = '<a href="http://www.google.com" target="_blank">Google</a> is a search engine. <a href="http://www.yahoo.com" title="yahoo" target="_blank">Yahoo</a> is a search engine.';

preg_match_all('/<a href="([^<]*)">([^<]*)<\/a>/', $content, $matches);

print_r($matches);

Result:

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.google.com" target="_blank">Google</a>
            [1] => <a href="http://www.yahoo.com" title="yahoo" target="_blank">Yahoo</a>
        )

    [1] => Array
        (
            [0] => http://www.google.com" target="_blank
            [1] => http://www.yahoo.com" title="yahoo" target="_blank
        )

    [2] => Array
        (
            [0] => Google
            [1] => Yahoo
        )

)

[Do not use regexes to parse HTML](http://stackoverflow.com/a/1732454/344643). Use an [XML parser](http://php.net/manual/en/class.domdocument.php) instead. — Waleed Khan, Feb 21 '13 at 22:25

score 2 · Accepted Answer · answered Feb 21 '13 at 22:37

2

your starting out looking for the > and not taking in to account any other attributes. try

/<a href="([^"]*)"[^>]+>([^<]*)<\/a>/

this will now pull out the href, then skip over the rest of the attributes, and then pull the html right up the next tag

answered Feb 21 '13 at 22:37

bizzehdee

20,289
11
46
76

Will this also match urls like this: Text – Deb Feb 03 '14 at 16:19

preg_match_all not matching href section correctly

1 Answers1