A client of mine has hundreds of simple, single page static websites. They are landing pages for various marketing tasks. They all use the same identical layout -- just a simple two column site with header and footer.
I want to copy the content of a few specific divs on each of these landing pages and then I will use them to popular my database so I can rebuild it with a new backend.
Basically there is a "main" div and a "sidebar" div and I need to copy the HTML exactly as is but replace the image urls to locally hosted copies.
I was able to create an array of all image URLs for a given domain using this:
$url="http://example.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
// save image to local server
}
I was able to capture the content of the main div using this method:
$maindiv = $doc->getElementById('main');
echo $doc->saveHTML($maindiv);
which seemed to work good, but it did not include any inner HTML for images. Basically this div contains a paragraph followed by an HTML bullet list, followed by an image or two, and perhaps a final paragraph. This code grabbed the text and bullet lists but did not grab the html or images.
Is there a better way to do this? If I can figure out how to iterate over this data and grab the contents of these divs I can really save a lot of manual time.