0

I'm trying to grab all the links and their content from a text, but my problem is that the links might also have other attributes like class or id. What would be the pattern for this?

What i tried so far is:

/<a href="(.*)">(.*)<\/a\>/

Thank You, Radu

Radu Dragomir
  • 660
  • 2
  • 9
  • 35
  • 3
    There are quite a few posts on SO that advise against parsing html using regular expressions. You should load the html into some sort of a structure and walk through that structure – vmpstr Feb 24 '12 at 16:09

2 Answers2

3

As the comment to your question states, avoid using regex for HTML. The correct way to do it is using DOMDocument

$dom = new DOMDocument;
$dom->load($html);

$xpath = new DOMXPath($dom);
$links = $xpath->query('//*/a');

foreach ($links as $link) {
    /* do something with this */
    $href = $link->getAttribute('href');
    $text = $link->nodeValue;
}

Edit:

An even better answer on the subject

Community
  • 1
  • 1
Leigh
  • 12,859
  • 3
  • 39
  • 60
0

This should do it:

/<a .*?href="(.*?)"[^>]*>([^<]*)<\/a>/i

Read this and see if you still want to use it.

Community
  • 1
  • 1
ohaal
  • 5,208
  • 2
  • 34
  • 53