1

I'm trying to parse through a html document and store the urls in an array with php. If for example the source code of the document is:

Blah blah blah <a href="http://google.com">link</a> blab
<a href="http://yahoo.com">more links</a> ababasadsf

How do I find and grab the href attribute of the links and store each as an array element?

seangeng
  • 9
  • 3

1 Answers1

3

Using phpQuery, you can traverse the DOM and find the anchors (<a>) with the href attribute defined:

$dom = phpQuery::newDocument($htmlSource);
$anchors = $dom->find('a[href]');

$urls = array();

if($anchors) {
  foreach($anchors as $anchor) {
    $anchor = pq($anchor);
    $urls[] = $anchor->attr('href');
  }
}
Andrew Moore
  • 93,497
  • 30
  • 163
  • 175