In my code I am trying to fetch entire in HTML codes and ignore all JavaScripts (AdSense Code) from my old website. I have about 800 pages and its hard for me to copy one by one. The main problem I am facing is my Xpath is too long and it gives me an error every time and secondly it only prints the text instead of HTML code. I don't know how to resolve it.
My XPath
/html/body/div/div/div/div[4]/table/tbody/tr/td/div/h2/table/tbody/tr/td/div[1]/table/tbody/tr/td[1]/div/table/tbody/tr/td/div/table/tbody/tr/td/div/table/tbody/tr/td/div
Errors I am getting are available at https://pastebin.com/FFRLr3vq
My Current PHP Code
error_reporting(E_ERROR);
$urls[] = "http://myoldwebsite.com/somepage.html";
function curlload($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
$source = curl_exec($ch);
return $source;
}
foreach ($urls as $url) {
$source = curlLoad($url);
@$doc = new DOMDocument();
@$doc->loadHTML($source);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//div[@class='pageContent']");
// To check the result:
echo "<p>" . $node->nodeValue . "</p>";
}