0

I have this code which extracts all links from a website. How do I edit it so that it only extracts links that ends on .mp3? Here are the following code:

preg_match_all("/\<a.+?href=(\"|')(?!javascript:|#)(.+?)(\"|')/i", $html, $matches); 
andrew
  • 31
  • 1
  • 9

2 Answers2

3

Update:

A nice solution would be to use DOM together with XPath, as @zerkms mentioned in the comments:

$doc = new DOMDocument();
$doc->loadHTML($yourHtml);
$xpath = new DOMXPath($doc); 

// use the XPath function ends-with to select only those links which end with mp3
$links = $xpath->query('//a[ends-with(@href, ".mp3")]/@href');

Original Answer:

I would use DOM for this:

$doc = new DOMDocument();
$doc->loadHTML($yourHtml);

$links = array();
foreach($doc->getElementsByTagName('a') as $elem) {
    if($elem->hasAttribute('href')
    && preg_match('/.*\.mp3$/i', $elem->getAttribute('href')) {
        $links []= $elem->getAttribute('href');
    }
}

var_dump($links);
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
1

I would prefer XPath, which is meant to parse XML/xHTML:

$DOM = new DOMDocument();
@$DOM->loadHTML($html); // use the @ to suppress warnings from invalid HTML
$XPath = new DOMXPath($DOM);

$links = array();
$link_nodes = $XPath->query('//a[contains(@href, ".mp3")]');
foreach($link_nodes as $link_node) {
    $source = $link_nodes->getAttribute('href');
    // do some extra work to make sure .mp3 is at the end of the string

    $links[] = $source;
}

There is an ends-with() XPath function that you can replace contains(), if you are using XPath 2.0. Otherwise, you might want to add an extra conditional to make sure the .mp3 is at the end of the string. It may not be necessary though.

Community
  • 1
  • 1
Sam
  • 20,096
  • 2
  • 45
  • 71