15

I'm trying to download a number of pdf files automagically given a list of urls.

Here's the code I have:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.Method = "GET";

var encoding = new UTF8Encoding();

request.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-gb,en;q=0.5");
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate");

request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0";

HttpWebResponse resp = (HttpWebResponse)request.GetResponse();

BinaryReader reader = new BinaryReader(resp.GetResponseStream());

FileStream stream = new FileStream("output/" + date.ToString("yyyy-MM-dd") + ".pdf",FileMode.Create);

BinaryWriter writer = new BinaryWriter(stream);

while (reader.PeekChar() != -1)
      {
       writer.Write(reader.Read());
      }
       writer.Flush();
       writer.Close();

So, I know the first part works. I was originally getting it and reading it using a TextReader - but that gave me corrupted pdf files (since pdfs are binary files).

Right now if I run it, reader.PeekChar() is always -1 and nothing happens - I get an empty file.

While debugging it, I noticed that reader.Read() was actually giving different numbers when I was invoking it - so maybe Peek is broken.

So I tried something very dirty

try
{
 while (true)
   {
    writer.Write(reader.Read());
    }
 }
   catch
      {
      }
 writer.Flush();
 writer.Close();

Now I'm getting a very tiny file with some garbage in it, but its still not what I'm looking for.

So, anyone can point me in the right direction?

Additional Information:

The header doesn't suggest its compressed or anything else.

HTTP/1.1 200 OK
Content-Type: application/pdf
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 10 Aug 2012 11:15:48 GMT
Content-Length: 109809
Aabela
  • 1,408
  • 5
  • 19
  • 28

4 Answers4

23

Skip the BinaryReader and BinaryWriter and just copy the input stream to the output FileStream. Briefly

var fileName = "output/" + date.ToString("yyyy-MM-dd") + ".pdf";
using (var stream = File.Create(fileName))
  resp.GetResponseStream().CopyTo(stream);
Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
  • 2
    I wonder if there is a way to get this into a byte array instead of sending it to the file system? – MetaGuru Aug 24 '15 at 20:12
  • 3
    @ioSamurai: Replace `File.Create(filename)` with `new MemoryStream()` and then at the end of the `using` block retrieve the bytes: `var bytes = stream.ToArray()`. A `MemoryStream` does not use any unmanaged resources so you can also drop the `using` block entirely. – Martin Liversage Aug 24 '15 at 20:33
  • @MartinLiversage hmm I have tried this a few times and while I do get a byte stream, when I ultimately write it to disk the pdf file is corrupt... however making the same request from the browser (I am using WebRequest in code) gives the PDF file fine. This may actually be some strange behavior related to how Report Server serves up PDF responses to web requests... – MetaGuru Aug 24 '15 at 20:40
  • @ioSamurai: I am pretty sure that the few lines of code I have provided does not corrupt a PDF file and I would be surprised if Report Server has a "strange behavior". To troubleshoot you can compare the first few bytes of the file and the length of the file using both your own code, a tool like Fiddler to see the stream in transit and the file retrieved using a web browser. – Martin Liversage Aug 24 '15 at 20:54
10

Why not use the WebClient class?

using (WebClient webClient = new WebClient())
{
    webClient.DownloadFile("url", "filePath");
}
2

Your question asks about WebClient but your code shows you using Raw HTTP Requests & Resposnses.

Why don't you actually use the System.Net.WebClient ?

using(System.Net.WebClient wc = new WebClient()) 
{
    wc.DownloadFile("http://www.site.com/file.pdf",  "C:\\Temp\\File.pdf");
}
Eoin Campbell
  • 43,500
  • 17
  • 101
  • 157
0
        private void Form1_Load(object sender, EventArgs e)
        {
  
            WebClient webClient = new WebClient();
            webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);
            webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);
            webClient.DownloadFileAsync(new Uri("https://www.colorado.gov/pacific/sites/default/files/Income1.pdf"), @"output/" + DateTime.Now.Ticks ("")+ ".pdf", FileMode.Create);
        }

        private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)
        {
            progressBar = e.ProgressPercentage;
        }

        private void Completed(object sender, AsyncCompletedEventArgs e)
        {
            MessageBox.Show("Download completed!");
        }
    }
}
srk
  • 1,625
  • 1
  • 10
  • 26