0

I'm trying to crawl a large number of images from my old website because it's soon going to be shut down. All of the images are in JPEG format, all of them are actually downloaded from the website, but some of them are shown in a green-pink accent on my local (Windows) computer.

I found out none of the corrupted pictures have color-space metadata and embedded color profile, but I'm not sure this is the problem, since I'm not familiar with image processing. I couldn't find any setting in C# to set the color profile to RGB or something similar. This is my code:

private static Image GetImageFromUrl(string url)
    {
        HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
        try
        {
            using (HttpWebResponse httpWebReponse = (HttpWebResponse)httpWebRequest.GetResponse())
            {
                using (Stream stream = httpWebReponse.GetResponseStream())
                {
                    return Image.FromStream(stream);
                }
            }
        }
        catch (WebException e)
        {
            return null;
        }
    }

    private static void SaveImage(string folderName, string fileName, Image img)
    {
        if (img == null || folderName == null || folderName.Length == 0)
        {
            return;
        }
        string path = "D:\\Files\\" + folderName;
        if (!Directory.Exists(path))
        {
            Directory.CreateDirectory(path);
        }
        using (img)
        {
            img.Save("D:\\Files\\" + folderName + "\\" + fileName, ImageFormat.Jpeg);
        }
    }

SaveImage(folderName, fileName, GetImageFromUrl(resultUrl));

This is how the pictures look like in the browser (left) and when downloading with this program (right):

Picture

Thank you for your help.

Nyerguds
  • 5,360
  • 1
  • 31
  • 63
balcsok
  • 33
  • 2
  • 11
  • This seems more like a job for Photoshop than C#... – Jason Watkins May 19 '15 at 00:02
  • It may be possible to manually "repair" the pictures one by one in PS, but as I said, there is a lot of them. I understand C# is not for manipulating images, but apparently there is a parameter that I'm missing that makes some of the pictures save incorrectly. – balcsok May 19 '15 at 00:07
  • Hmm... I just noticed that your are converting the response stream to an Image and re-saving it. Are the original images not already jpegs? If they are, you should be able to just save the stream to a file. – Jason Watkins May 19 '15 at 00:11
  • Yes, the originals are also JPEGs. Saving the stream directly to a file based on http://stackoverflow.com/questions/411592/how-do-i-save-a-stream-to-a-file-in-c did the trick. The images however have the same (faulty) color palette set (but this is expected), but the common image viewers are able to view them correctly (this I don't know how). Thank you, and you may consider adding it as an answer so I can accept. – balcsok May 19 '15 at 00:39
  • Re-saving lossy-compression images seems like a bad idea anyway when all you're really trying to do is download them. – Nyerguds May 24 '18 at 07:28
  • @Nyerguds How would you recommend to improve this? Even if the task was done long time ago, it may be helpful to update the solution for others. – balcsok May 24 '18 at 07:53
  • @Csoky No, I think saving the bytes directly as you did is the perfect solution here. The fact the .Net framework has trouble loading your images here is actually unrelated to the original purpose of your program, which was really just saving files from the website. – Nyerguds May 24 '18 at 09:24
  • @Nyerguds Exactly, just saving the stream as you received it, like a crawler, makes more sense in this case, and is probably faster, than tinkering with image conversions. Thanks for the feedback anyway. – balcsok May 24 '18 at 19:22

1 Answers1

2

OK it seems this problem can be bypassed by saving the file stream directly without converting it to an Image object on the local machine. The corrupt color palette stays, but somehow the Windows Picture Viewer is able to display the saved images correctly (no strange color accents) and so does Photoshop etc.

As suggested by Jason Watkins in comments, my code looks like this now:

private static void SaveImageFromUrl(string folderName, string fileName, string url)
    {
        HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
        try
        {
            using (HttpWebResponse httpWebReponse = (HttpWebResponse)httpWebRequest.GetResponse())
            {
                using (Stream stream = httpWebReponse.GetResponseStream())
                {
                    //need to call this method here, since the image stream is not disposed
                    SaveImage(folderName, fileName, stream);
                }
            }
        }
        catch (WebException e)
        {
            Console.WriteLine("Image with URL " + url + " not found." + e.Message);
        }

    }

    private static void SaveImage(string folderName, string fileName, Stream img)
    {
        if (img == null || folderName == null || folderName.Length == 0)
        {
            return;
        }
        string path = "D:\\Files\\" + folderName;
        if (!Directory.Exists(path))
        {
            Directory.CreateDirectory(path);
        }

        using (var fileStream = File.Create("D:\\Files\\" + folderName + "\\" + fileName))
        {
            img.CopyTo(fileStream);
            //close the stream from the calling method
            img.Close();
        }
    }

SaveImageFromUrl(folderName, fileName, resultUrl);
balcsok
  • 33
  • 2
  • 11