download all Images of a Website

Question

So I just started learning C# last night. The first project I started was a simple Image-Downloader, which downloads all images of a website using HtmlElementCollection.

Here's what I got so far:

    private void dl_Click(object sender, EventArgs e)
    {
        System.Net.WebClient wClient = new System.Net.WebClient();

        HtmlElementCollection hecImages = Browser.Document.GetElementsByTagName("img");

        for (int i = 0; i < hecImages.Count - 1; i++)
        {

            char[] ftype = new char[4];
            string gtype;

            try
            {
                //filetype
                hecImages[i].GetAttribute("src").CopyTo(hecImages[i].GetAttribute("src").Length -4,ftype,0,4) ;
                gtype = new string(ftype);

                //copy image to local path
                wClient.DownloadFile(hecImages[i].GetAttribute("src"), absPath + i.ToString() + gtype);                                                                               
            }
            catch (System.Net.WebException) 
            {
                expand_Exception_Log();
                System.Threading.Thread.Sleep(50);
            }

Basically it's rendering the page in advance and looking for the images. This works pretty well, but for some reason it only downloads the Thumbnails, but not the full (high-res) image.

Additional Sources:

Documentation on WebClient.DownloadFile: http://msdn.microsoft.com/en-us/library/ez801hhe(v=vs.110).aspx

The DownloadFile method downloads to a local file data from the URI specified by in the address parameter.

What's the webpage you are hitting? Did you try some different pages on different sites? Are you sure the `src` you are getting are the url's of the full image and not the thumbnail? If you take one of those url's and paste it into the address bar of your browser, what do you get? Full size, or thumbnail? — Matt Burland, Oct 04 '14 at 02:20
I've tried: EEVBlog, My own homepage, Google images, Stackoverflow, and some news-sites. I've created a log which displayed all src strings in my console before downloading the images - with those links I usually got the full sized image. (Except on Google). — Erwin Schrödinger, Oct 04 '14 at 02:27
I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". — John Saunders, Oct 04 '14 at 03:56

score 7 · Accepted Answer · edited Oct 20 '17 at 12:33

Take a gander at How can I use HTML Agility Pack to retrieve all the images from a website?

This uses a library called HTML Agility Pack to download all <img src="" \> lines on a website. How can I use HTML Agility Pack to retrieve all the images from a website?

If that topic somehow disappears, I'm putting this up for those who need it but can't reach that topic.

// Creating a list array
public List<string> ImageList; 
public void GetAllImages()
{
    // Declaring 'x' as a new WebClient() method
    WebClient x = new WebClient();

    // Setting the URL, then downloading the data from the URL.
    string source = x.DownloadString(@"http://www.google.com");

    // Declaring 'document' as new HtmlAgilityPack() method
    HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

    // Loading document's source via HtmlAgilityPack
    document.LoadHtml(source);

    // For every tag in the HTML containing the node img.
    foreach(var link in document.DocumentNode.Descendants("img")
                                .Select(i => i.Attributes["src"])) 
    {
        // Storing all links found in an array.
        // You can declare this however you want.
        ImageList.Add(link.Attribute["src"].Value.ToString());
    }
}

Since you are rather new as you stated, you can add HTML Agility Pack easily with NuGet. To add it, you right-click on your project, click Manage NuGet Packages, search the Online tab on the left hand side for HTML Agility Pack and click install. You need to call it by using using HtmlAgilityPack;

After all that you should be fine creating and using a method already created to download all items contained in the image_list array created above.

Good luck!

EDIT: Added comments explaining what each section does.

EDIT2: Updated snippet to reflect user comment.

This basically stores the URI in image_links[] instead of the whole img context 'src=URI ...' in hecImages[], right? I'll try if this package gives me a better result. Thanks already! — Erwin Schrödinger, Oct 04 '14 at 18:27
Well, using that image_links[] array you can use a simple `foreach(string uri in image_links)` method in another function to download all the images. — Brandon Palmer, Oct 04 '14 at 18:28
My edit got rejected, so here's what I had to do to get it to work: (1) document.LoadHtml instead of document.Load, as it can't resolve the path (ArgumentException). (2) DocumentElement doesn't exist anymore in the current version of HAP. Instead you have to use DocumentNodes. (3) link["src"] -[]-indexes are not allowed on a expression of type HAP.HtmlNode. Instead you have to call .Attribute: link.Attribute["src"]. To return a string you need to call link.Attribute["src"].Value. — Erwin Schrödinger, Oct 04 '14 at 22:38
the above code is broke, the variable "x" is already used in a previous context. update .Select(i => i.Attributes["src"])) will compile. — mrogunlana, Aug 29 '17 at 01:37

download all Images of a Website

1 Answers1

Linked