0

I'm working on downloading a website for offline viewing, and the images on the website are written inside the <picture></picture> element.

When I download a page using the browser or using a website download software, the images are not downloaded.

For example, this image

<picture>
    <source srcset="thumb1.jpg" media="(min-width: 1200px)">
    <source srcset="thumb2.jpg" media="(min-width: 992px)">
    <source srcset="thumb3.jpg" media="(min-width: 600px)">
    <source srcset="thumb4.jpg" media="(min-width: 320px)">

    <img src="main-image.jpg">
</picture>

The browser downloads the main-image.jpg img but it doesn't download the source images thumb1.jpg, thumb2.jpg, etc.. and this results in that all images are not showing after downloading the page.

Why is that? Why doesn't the browser downloads the source images?

The website is built on a php CMS called Concrete5 so the html code gets generated and I can't change the generated html. The website is http://www.exrx.net/concrete

Is there a solution for this problem? Is there a software free/paid to accomplish this task?

I tried a software called HTTrack, and Getleft. They both are behaving the same way as the browser.

Tarek Mostafa
  • 378
  • 3
  • 4
  • 15

1 Answers1

0

Is there a solution for this problem? Is there a software free/paid to accomplish this task?

I do not know a tool which will also download files found on source tags.

Edit 02

With following snippet you will grab all images:

$html = file_get_contents('http://www.exrx.net/concrete/');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
libxml_clear_errors();

$imgs = [];
$pictures = $doc->getElementsByTagName('picture');
foreach ($pictures as $picture) {
    $img = [];
    $img[] = $picture->getElementsByTagName('img')->item(0)->getAttribute('src');
    $sources = $picture->getElementsByTagName('source');
    foreach ($sources as $source) {
        $img[] = $source->getAttribute('srcset');
    }
    $imgs[] = $img;
}

This will produce:

(
    [0] => Array
        (
            [0] => path/to/file/i1-the original-img-tag-image.gif
            [1] => path/to/file/i2.png
            [2] => path/to/file/i3.png
            [3] => path/to/file/i4.png
            [4] => path/to/file/i5.png
        )
    ...
)

These images can then be replaced.

Src:
- libxml_use_internal_errors() on SO
- DOM on php.net

1stthomas
  • 731
  • 2
  • 15
  • 22
  • Thank you very much, your first solution sounds more doable to me than the other. You said "Then remove the `new \HtmlObject\Image()` and replace it with your own tag". I'm new to concrete5 what should I replace `$tag` variable with? – Tarek Mostafa Jan 17 '18 at 08:28
  • Is this also applied to images placed inside the content block? because 90% of the images on the website are used inside content blocks, and the few others in Image block. – Tarek Mostafa Jan 17 '18 at 08:29
  • @Tarek Mostafa: `Is this also applied to images placed inside the content block?` - No, if you use the default core `content block` then you will get a `picture` tag with a single `img` tag. – 1stthomas Jan 17 '18 at 11:53
  • @Tarek Mostafa: For your second question `... what should I replace ...` I actualized the answer. – 1stthomas Jan 17 '18 at 12:04
  • Hi. I tried your code but it gave me many errors one of them was `Call to a member function getThumbnail() on null` and `Call to a member function alt() on string`.. But even if the code works you say it will not affect the content block which is 90% of the problem – Tarek Mostafa Jan 17 '18 at 14:25
  • @Tarek Mostafa: It will not affect the content block because the content block does not have any source tags. Therefor you will not have your problem describen in your question. - I will have a look for your errors in 2.5h, after my work. – 1stthomas Jan 17 '18 at 14:34
  • Thanks for your help. I think there's something wrong then because all images placed in content blocks have the `source` tag, you can check the home page http://www.exrx.net/concrete and look in the developer tools, you will see they have `source` tags. We are using the Fruitful theme, could the theme be the problem here? thanks for your time. – Tarek Mostafa Jan 17 '18 at 18:49