3

I'm currently using the DuckDuckGo icon utility to fetch the favicon's for certain webpages, although in order to fetch an icon it requires you to add ".ico" at the very end of the request, for example: https://icons.duckduckgo.com/ip2/www.google.com.ico

So, I'm using a WebClient to download the favicon, although; it doesn't seem to be downloading it completely since every time I open the file it appears corrupted and throws an error stating that "the file header can't be read".

I've tried the following so far (my WebClient is called client, the icon to set is called favicon and the path to the icon file is called favicon_path):

Uri favicon_url = new Uri(
    "https://icons.duckduckgo.com/ip2/" + gBrowser.Url.Host.ToString() + ".ico");
client.DownloadFile(@favicon_url, favicon_path);
favicon = new Icon(favicon_path);

and

Uri favicon_url = new Uri("https://icons.duckduckgo.com/ip2/"
    + gBrowser.Url.Host.ToString().Replace(".", "%2E") + ".ico");
client.DownloadFile(@favicon_url, favicon_path);
favicon = new Icon(favicon_path);

I'm guessing that the multiple periods ('.') in the favicon_url are responsible, so my question is: How can I download the favicon using a WebClient (or something similar) if it has multiple periods in its name? Or if not the periods, why can't I read the file downloaded from DuckDuckGo?

Peter Duniho
  • 68,759
  • 7
  • 102
  • 136
  • _"I'm guessing that the multiple periods (.) in the favicon_url are responsible"_ -- that sounds like a very poor guess to me. Either the URL is valid or it's not. If it's not, you'd get an error on the HTTP request, not invalid data. I think it's likely the bigger issue is that web sites don't use actual Windows icon files for their icons. They are actually bitmaps and need to be read as such. You seem to be trying to pass the downloaded file to an `Icon` constructor, but I doubt the file is actually a valid .ico file (in spite of the extension the web site is making you use). – Peter Duniho Jun 17 '17 at 06:45
  • The multiple period shouldn't be a problem. Can you open the downloaded file in a text editor such as notepad, and confirm if it is a binary file being saved? Possibly it is a HTML error page which being saved. – James Jun 17 '17 at 06:49
  • What does `favicon_url.ToString()` return? Does the code work if you test using `https://icons.duckduckgo.com/ip2/www.google.com.ico`? – mjwills Jun 17 '17 at 06:50
  • Well, I stand corrected (by my own inspection of the data) on the question of whether it's a valid .ico file or not. It is. But I still contend that the download issue has nothing to do with the periods. You are corrupting the data some other way. – Peter Duniho Jun 17 '17 at 06:54
  • "gBrowser" is a Web Browser control, so it's just getting the host of the URL loaded in the Web Browser control. The "https://icons.duckduckgo.com/ip2/" always returns an Icon file, it does all the heavy lifting in terms of fetching the website's favicon (including converting it into an Icon if necessary). @mjwills The code doesn't work even if tested using the URL suggested. –  Jun 17 '17 at 07:01
  • @PeterDuniho Yeah that seems to be the issue, I just can't seem to figure out what is corrupting the file; the only thing I can think of corrupting it would be whilst it's trying to download the file. –  Jun 17 '17 at 07:02
  • Do you have a link to the DuckDuckGo API docs? I doubt `WebClient` is doing anything other than saving exactly the bytes the web server sends. But it's possible the bytes sent are somehow reinterpreted by a an actual web browser, in a way that `WebClient` doesn't know how to do. I would expect the docs to describe that. – Peter Duniho Jun 17 '17 at 07:07
  • @PeterDuniho The only link to some sort of API documentation I could find was this: https://duckduckgo.com/api There's also the Duck.co forums, but they don't seem to have anything on the topic of fetching favicons (from what I've looked through so far). –  Jun 17 '17 at 07:19
  • https://stackoverflow.com/questions/4567313/uncompressing-gzip-response-from-webclient may be of interest – mjwills Jun 17 '17 at 08:06

1 Answers1

2

Okay, here's what you need to do (after the call to DownloadFile()):

using (Stream inputStream = File.OpenRead(favicon_path))
using (Stream gzipStream = new GZipStream(inputStream, CompressionMode.Decompress))
{
    MemoryStream copyStream = new MemoryStream();

    gzipStream.CopyTo(copyStream);
    copyStream.Position = 0;

    favicon = new Icon(copyStream);
}

I noticed that the downloaded file was much smaller than the actual .ico file. That suggested the data was being compressed somehow. Gzip is the defacto cross-platform stream-compression format, so I made a guess and tried decompressing the data as if it were compressed with gzip. And sure enough, it was.

Note above that you need to decompress the data into an intermediate buffer first (I used a MemoryStream object). The Icon constructor will try to seek the stream, which is not supported on a GzipStream object (for obvious reasons). So, you need to decompress the data into a Stream object that is seekable.

Note also that there is an alternative solution using HttpWebRequest, which does support decompression during download. This is instead of using WebClient.DownloadFile(), not after as in the other example above.

You still have to copy to an intermediate buffer first (again, because Icon wants to seek the source stream, which is not seekable). But this approach allows the data to be read straight from the remote server into an Icon object, without requiring the intermediate file:

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(favicon_url);

// You can also include `DecompressionMethods.Deflate` here, for a more general solution
request.AutomaticDecompression = DecompressionMethods.GZip;

MemoryStream copyStream = new MemoryStream();

request.GetResponse().GetResponseStream().CopyTo(copyStream);
copyStream.Position = 0;
favicon = new Icon(copyStream);
Peter Duniho
  • 68,759
  • 7
  • 102
  • 136
  • That seems to work perfectly, although the website I've been using as a test (youtube.com) seems to have a PNG set as their "shortcut icon" with the extension renamed to appear as though it was an icon. So I'm going to have to go back to my other option of converting the PNG file into an Icon without losing its quality. Thanks anyways! –  Jun 17 '17 at 08:22