1

.NET Core 2.2 console application on Windows.

I'm exploring how to use HttpClient GetAsync on a Stackoverflow share style URL eg: https://stackoverflow.com/a/29809054/26086 which returns a 302 redirect URL with a hash in it

static async Task Main()
{
    var client = new HttpClient();

    // 1. Doesn't work - has a hash in URL
    var url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054#29809054";
    HttpResponseMessage rm = await client.GetAsync(url);
    Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request

    // 2. Does work - no hash
    url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054";
    rm = await client.GetAsync(url);
    Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 200 Okay

    // 3. Doesn't work as the 302 redirect goes to the first URL above with a hash
    url = "https://stackoverflow.com/a/29809054/26086";
    rm = await client.GetAsync(url);
    Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request
}

I'm crawling my blog which has many SO short codes in it.

Update/Workaround With thanks to @rohancragg I found that turning off AutoRedirect then getting the URI from the returned header worked

// as some autoredirects fail due to #fragments in url, handle redirects manually
var handler = new HttpClientHandler { AllowAutoRedirect = false };
var client = new HttpClient(handler);

var url = "https://stackoverflow.com/a/29809054/26086";    
HttpResponseMessage rm = await client.GetAsync(url);

// gives the desired new URL which can then GetAsync
Uri u = rm.Headers.Location;
Erik Philips
  • 53,428
  • 11
  • 128
  • 150
Dave Mateer
  • 6,588
  • 15
  • 76
  • 125
  • 3
    URLs sent to servers don't contain the `#` fragment. It's only of use in a client such as e.g. a browser. – Damien_The_Unbeliever Jan 03 '19 at 15:51
  • That makes sense thank you. I'm digging into why it is giving a 400 now as I would like for HttpClient to ignore the hash. I've updated the question to highlight the reason I need to do this ie a StackOverflow share url. – Dave Mateer Jan 03 '19 at 16:28
  • As @Damien_The_Unbeliever implies, you'll just need to strip off the hash and everything after it - all that does is tell the browser to jump to that anchor tag in the HTML page (see: https://www.w3schools.com/jsref/prop_anchor_hash.asp). So this would mean that your option 2 is your only option in this case... – rohancragg Jan 04 '19 at 09:54
  • 1
    You could also use the Uri class to parse the Uri and ignore any 'fragments': https://learn.microsoft.com/en-us/dotnet/api/system.uri.fragment – rohancragg Jan 04 '19 at 10:00
  • Thanks @rohancragg - but what if I'm requesting https://stackoverflow.com/a/29809054/26086 which then returns the 302, and automatically requests the URL with a hash in it. Maybe I'll have to stop the autoredirect https://stackoverflow.com/a/10647245/26086 and then strip off the hash, then do another request. – Dave Mateer Jan 04 '19 at 10:00

2 Answers2

1

As @Damien_The_Unbeliever implies in a comment, you'll just need to strip off the hash and everything after it - all that does is tell the browser to jump to that anchor tag in the HTML page (see: https://w3schools.com/jsref/prop_anchor_hash.asp).

You could also use the Uri class to parse the Uri and ignore any 'fragments': https://learn.microsoft.com/en-us/dotnet/api/system.uri.fragment

Because the share-style Urls are only ever going to return a 302 then I'd suggest capturing the Uri to which the 302 is referring and do as I suggest above and just get the path and ignore the fragment.

So you need to use some mechanism (which I'm just looking up!) to handle a 302 gracefully followed by option 2

Update: this looks relevant! How can I get System.Net.Http.HttpClient to not follow 302 redirects?

Update 2 Steve Guidi has a very important bit of advice in a comment here: https://stackoverflow.com/a/17758758/5351

In response to the advice that you need to use HttpResponseMessage.RequestMessage.RequestUri:

it is very important to add HttpCompletionOption.ResponseHeadersRead as the second parameter of the GetAsync() call


Disclaimer - I've not tried the above, this is just based on reading ;-)

rohancragg
  • 5,030
  • 5
  • 36
  • 47
-1

Maybe you need to encode your URL before send the request using HttpUtility class, this way any special character will be escaped.

using System.Web;

var url = $"htpps://myurl.com/{HttpUtility.UrlEncode("#1234567")}";
  • Thank you Vinick - I think I need another tactic to bend HttpClient into what I want it to do ie respond correctly when I give a url such as: https://stackoverflow.com/a/29809054/26086 – Dave Mateer Jan 03 '19 at 16:38