1

I need to find every occurrence of URL coded in "href" part of html tag using PHP.

As result, I want to get array of every url. Tried a little of this, but it finds only "href=" starting thing. I know that my code is very basic, but I don't know how to improve or change this, to make it works. Thanks for all help.

<?php

$array = [];  
$string = file_get_contents("file.html");  
$begin = 0;  
$end = 0;

do {  
    $begin = strpos($string, "<a href=\"", $end + 1);  
    $end = strpos($string, "\"", $begin + 6);  
    $array[] = substr($string, ($begin + 6), ($end - $begin - 6));
} while ($begin !== false && $end !== false);
nice_dev
  • 17,053
  • 2
  • 21
  • 35

1 Answers1

1

Use DOMDocument for that, not Regex!

$html = file_get_contents('file.html');

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$tags = $xpath->query('//a');
$links = [];

foreach ($tags as $tag) {
    $links[] = $tag->getAttribute('href');
}

Example

Justinas
  • 41,402
  • 5
  • 66
  • 96