Geting links from a context using preg_match_all

Question

I'm trying to grab all the links and their content from a text, but my problem is that the links might also have other attributes like class or id. What would be the pattern for this?

What i tried so far is:

/<a href="(.*)">(.*)<\/a\>/

Thank You, Radu

There are quite a few posts on SO that advise against parsing html using regular expressions. You should load the html into some sort of a structure and walk through that structure — vmpstr, Feb 24 '12 at 16:09

score 3 · Answer 1 · edited May 23 '17 at 12:20

As the comment to your question states, avoid using regex for HTML. The correct way to do it is using DOMDocument

$dom = new DOMDocument;
$dom->load($html);

$xpath = new DOMXPath($dom);
$links = $xpath->query('//*/a');

foreach ($links as $link) {
    /* do something with this */
    $href = $link->getAttribute('href');
    $text = $link->nodeValue;
}

Edit:

An even better answer on the subject

score 0 · Accepted Answer · edited May 23 '17 at 12:03

0

This should do it:

/<a .*?href="(.*?)"[^>]*>([^<]*)<\/a>/i

Read this and see if you still want to use it.

edited May 23 '17 at 12:03

Community

1
1

answered Feb 24 '12 at 16:09

ohaal

5,208
2
34
53

Geting links from a context using preg_match_all

2 Answers2