9

I am trying to download a file from a server using System.Web. It actually works, but some links give me trouble. The links look like this:

http://cdn.somesite.com/r1KH3Z%2FaMY6kLQ9Y4nVxYtlfrcewvKO9HLTCUBjU8IBAYnA3vzE1LGrkqMrR9Nh3jTMVFZzC7mxMBeNK5uY3nx5K0MjUaegM3crVpFNGk6a6TW6NJ3hnlvFuaugE65SQ4yM5754BM%2BLagqYvwvLAhG3DKU9SGUI54UAq3dwMDU%2BMl9lUO18hJF3OtzKiQfrC/the_file.ext

The code looks basically like this:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
WebResponse response = request.getResponse();

getResponse() always throws an exception (Error 400 Bad Request). However, I know the link works because I can download the file with Firefox without problems.

I also tried to decode the link with Uri.UnescapeDataString(link), but that link wont even work in Firefox.

Other links work perfectly fine this way.. just these won't work.

Edit:

Okay, i found something out using wireshark:

If i open the link using Firefox, this is sent:

&ME3@"dM*PNyAo PA:]GET /r1KH3Z%2FaMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs4QgbudzcrJivrAaOTYkEnozqmdoSCCY8yb1i22YtEAV/epd_outpost_12adb.flv HTTP/1.1
Host: cdn.somesite.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive

I think only the first line is the problem, because WebRequest.Create(link) decodes the url:

&MEz.@!dM/nP9@~P>.GET /r1KH3Z/aMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs6Mmh1EsQQ4vJVYUwtbLBDNx9AwCHlWDfzfSWIHzaaIo/epd_outpost_12adb.flv HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Host: cdn.somesite.com

( %2F is replaced with / )

Another edit:

I found out that the Uri class decodes the url automatically: Uri uri = new Uri(link); //link is not decoded Debug.WriteLine(uri.ToString()); //link is decoded here.

How can I prevent this?

Thanks in advance for your help.

Ian
  • 33,605
  • 26
  • 118
  • 198
Pasukaru
  • 1,050
  • 1
  • 10
  • 22
  • 3
    Hard to tell, if you don't give the real url. Try to look at the communication that a browser does to get the response (with Live HTTP headers addon, or a tool like Wireshark) - that should give you a hint what to change in your request. – voidengine May 02 '12 at 12:39
  • Uri class does that, it had a constructor with parameter dontEscape but it's obsolete and does nothing – Antonio Bakula May 02 '12 at 14:06
  • Thanks, I just found this out. Is there any way to prevent it from doing this? – Pasukaru May 02 '12 at 14:09

2 Answers2

20

By default, the Uri class will not allow an escaped / character (%2f) in a URI (even though this appears to be legal in my reading of RFC 3986).

Uri uri = new Uri("http://example.com/embed%2fded");
Console.WriteLine(uri.AbsoluteUri); // prints: http://example.com/embed/ded

(Note: don't use Uri.ToString to print URIs.)

According to the bug report for this issue on Microsoft Connect, this behaviour is by design, but you can work around it by adding the following to your app.config or web.config file:

<uri>
  <schemeSettings>
    <add name="http" genericUriParserOptions="DontUnescapePathDotsAndSlashes" />
  </schemeSettings>
</uri>

(Since WebRequest.Create(string) just delegates to WebRequest.Create(Uri), you would need to use this workaround no matter which method you call.)

Community
  • 1
  • 1
Bradley Grainger
  • 27,458
  • 4
  • 91
  • 108
3

This has now changed in .NET 4.5. By default you can now use escaped slashes. I posted more info on this (including screenshots) in the comments here: GETting a URL with an url-encoded slash

Community
  • 1
  • 1
Glenn Block
  • 8,463
  • 1
  • 32
  • 34