I am trying to extract the html content from inside a website. I want only the content inside the tags.
//$validLink is a link with .htm extension, source code is rather large
//contains 24,000 lines of html code
$thehtml = file_get_contents($validlink);
$thehtml = preg_match("/<body.*?>(.*?)<\/body>/is", $thehtml);
What else can I do? $thehtml is empty.... I am trying to insert this into a wordpress post... but $thehtml is empty.... for some odd reason. Is there a possible timeout issue or something???
There can't be a timeout issue..... due to the fact that I noticed that if I output just file_get_contents($validlink); for some reason BODY is not found.....
Another possible solution would be just to get the content between the first div and the last div found in the document....