Download a PDF from a third party using ASP.NET HttpWebRequest/HttpWebResponse

Question

I want to send a url as query string e.g.

localhost/abc.aspx?url=http:/ /www.site.com/report.pdf

and detect if the above URL returns the PDF file. If it will return PDF then it gets saved automatically otherwise it gives error.

There are some pages that uses Handler to fetch the files so in that case also I want to detect and download the same.

localhost/abc.aspx?url=http:/ /www.site.com/page.aspx?fileId=223344

The above may return a pdf file.

What is best way to capture this?

Thanks

score 1 · Accepted Answer · edited May 23 '17 at 12:29

You can download a PDF like this

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = req.GetResponse();
//check the filetype returned
string contentType = response.ContentType;
if(contentType!=null)
{
    splitString = contentType.Split(';');
    fileType = splitString[0];  
}

//see if its PDF
if(fileType!=null && fileType=="application/pdf"){
    Stream stream = response.GetResponseStream();
    //save it
    using(FileStream fileStream = File.Create(fileFullPath)){
      // Initialize the bytes array with the stream length and then fill it with data
      byte[] bytesInStream = new byte[stream.Length];
      stream.Read(bytesInStream, 0, bytesInStream.Length);    
      // Use write method to write to the file specified above
      fileStream.Write(bytesInStream, 0, bytesInStream.Length);
    }
}

response.Close();

The fact that it may come from an .aspx handler doesn't actually matter, it's the mime returned in the server response that is used.

If you are getting a generic mime type, like application/octet-stream then you must use a more heuristical approach.

Assuming you cannot simply use the file extension (eg for .aspx), then you can copy the file to a MemoryStream first (see How to get a MemoryStream from a Stream in .NET?). Once you have a memory stream of the file, you can take a 'cheeky' peek at it (I say cheeky because it's not the correct way to parse a PDF file)

I'm not an expert on PDF format, but I believe reading the first 5 chars with an ASCII reader will yield "%PDF-", so you can identify that with

bool isPDF;
using(  StreamReader srAsciiFromStream = new StreamReader(memoryStream,
    System.Text.Encoding.ASCII)){
        isPDF = srAsciiFromStream.ReadLine().StartsWith("%PDF-");

}

//set the memory stream back to the start so you can save the file
memoryStream.Position = 0;

some url return mime type of application/octet-stream which may have file of any kind. How can we detect a pdf in that case? — soccer7, Oct 17 '14 at 15:39
How can we use Response.Write() to write it on client's browser as Content Typr="application/pdf" ? — soccer7, Oct 18 '14 at 15:31
Can you please tell me the resource from where I can read more about streams. — soccer7, Oct 18 '14 at 17:45

Download a PDF from a third party using ASP.NET HttpWebRequest/HttpWebResponse

1 Answers1