-1

I have an HTML with many links. I am currently able to get links, just all over, I would only get a certain word.


$dom = new DOMDocument;
$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
    echo $link->getAttribute('href');
}

I would list only links that contained a certain word, example: sendspace.com

result would be more or less below the:
http://www.fileserve.com/file/eDpDMm9sad/
http://www.fileserve.com/file/7s83hjh347/

I would then convert these links to sha1.

after conversion to save the html sha1 already applied to the links with the words contained.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
user653891
  • 1
  • 1
  • 4

2 Answers2

2

Using phpQuery, you can traverse the DOM and find the anchors (<a>) with the href attribute containing what you want:

$dom = phpQuery::newDocument($htmlSource);
$anchors = $dom->find('a[href|=sendspace.com]');

$urls = array();

if($anchors) {
  foreach($anchors as $anchor) {
    $anchor = pq($anchor);
    $urls[] = $anchor->attr('href');
  }
}
Andrew Moore
  • 93,497
  • 30
  • 163
  • 175
0

You can use regex to match your word (or whatever else) in the string like so:

foreach ($links as $link) {
    if (preg_match("/example\.com/i", $link->getAttribute('href'))) {
        // do things here!
    }
}
Matthew Rapati
  • 5,648
  • 4
  • 28
  • 48