I have looked around the web on how to scrape all headings (h1 to h6) with content. Like this <h2>Some Heading</h2>
, <h4>Some Heading</h4>
. I have even looked at file_get_html() which PHP does not recognize. The code I have written so far lets you see the content but with out the h1 tags. I am new to this so if anyone can help me I would appreciate it. Here is my code I have now:
<html>
<head>
<title></title>
</head>
<body>
<?php
$theurl = "http://www.msn.com";
if(!($contents=file_get_contents($theurl)))
{
echo 'Could not open URL';
exit;
}else{
echo "The $theurl is open <br />";
}
$pattern = "/<h[1-6]>(.*?)<\/h[1-6]>/si";
$found = preg_match_all($pattern,$contents,$matches);
if(is_array($matches) && count($matches) >= 1){
echo "Scraping $theurl<br />";
for($i = 1; $i <= $found - 1; $i++){
echo $matches[0][$i];
}
}else{
echo "No heading found";
}
?>
</body>
</html>