The better way to parse an HTML file is to use DOMDocument
and, in many cases, combine that with DOMXPath
to run queries on the DOM to find elements of interest.
For instance, in your case to extract the meta description you could do:
$url='https://tipodense.dk/';
# create the DOMDocument and load url
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->loadHTMLFile( $url );
libxml_clear_errors();
# load XPath
$xp=new DOMXPath( $dom );
$expr='//meta[@name="description"]';
$col=$xp->query($expr);
if( $col && $col->length > 0 ){
foreach( $col as $node ){
echo $node->getAttribute('content');
}
}
Which yields:
Har du brug for at vide hvad der sker i Odense? Vores fokuspunkter er især events, mad, musik, kultur og nyheder. Hvis du vil vide mere så læs med på sitet.
Using the sitemap ( or part of it ) you could do like this:
$url='https://tipodense.dk/';
$sitemap='https://tipodense.dk/sitemap-pages.xml';
$urls=array();
# create the DOMDocument and load url
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->strictErrorChecking=false;
$dom->recover=true;
# read the sitemap & store urls
$dom->load( $sitemap );
libxml_clear_errors();
$col=$dom->getElementsByTagName('loc');
foreach( $col as $node )$urls[]=$node->nodeValue;
foreach( $urls as $url ){
$dom->loadHTMLFile( $url );
libxml_clear_errors();
# load XPath
$xp=new DOMXPath( $dom );
$expr='//meta[@name="description"]';
$col=$xp->query( $expr );
if( $col && $col->length > 0 ){
foreach( $col as $node ){
printf('<div>%s: %s</div>', $url, $node->getAttribute('content') );
}
}
}