HttpClient - detect Content-Type

Question

I need to detect a type of content located on specific URL. So I created a method to get the Content-Type of response. For small files and HTML pages it works without problems, but if URL points to a big file, request takes a long time - it fetches entire content (file) on background. So, it is possible to cancel request and return result immediately after teh Content-Type header is received?

My current implementation:

    public async static Task<string> GetContentType(string url)
    {
        try
        {
            using (HttpClient client = new HttpClient())
            {
                var response = await client.GetAsync(url);
                if (!response.IsSuccessStatusCode)
                {
                    return null;
                }

                return response.Content.Headers.ContentType.MediaType;
            }
        }
        catch (HttpRequestException)
        {
            return null;
        }
    }

I would imagine the functionality in that question is dependant on the server sending the response though. It may not honour the request and just send you everything anyway — Liam, Jan 19 '16 at 13:36
You cannot do this with `HttpClient`, but `WebHttpRequest` does allow you to. — Richard, Jan 19 '16 at 13:56
WebException cannot handle all the possible execptions of the GetAsync method. You should consider a broader catch. See https://msdn.microsoft.com/en-us/library/windows/apps/dn298646.aspx?f=255&MSPPError=-2147217396 . — Mehrzad Chehraz, Jan 19 '16 at 13:57
@Richard See the answers here. What's wrong with their use of HttpClient? Also, if nothing's wrong, then this question isn't a duplicate. — Yam Marcovic, Jan 19 '16 at 14:02
@YamMarcovic Firstly it is a duplicate: getting only the headers, and thus extracting the `Content-Type` is what the other Q asks. Secondly I checked `HttpClient` for a `Get` overload that took an HTTP method parameter (but missed there is a different method that does). In any case the other Q answers this question (even if it does use a different type). — Richard, Jan 19 '16 at 14:06
@Richard - my question is not a duplicate... I'm using the HttpClient class — Dominik Palo, Jan 19 '16 at 14:06
@Richard The question, even in its title, deals with the HttpClient class. By your reasoning, you might as well have marked it as a duplicate since there's an answer with an equivalent Python implementation. — Yam Marcovic, Jan 19 '16 at 14:15
@YamMarcovic By that argument if a question has a title that names completely the wrong approach to a problem all answers must stick with the wrong approach? I am unmoved; if you have a real problem with this the right place to object is on meta. — Richard, Jan 19 '16 at 14:42
@Richard The question's title asks about a specific class (which is a correct one to use). The content deals with that same class. How do we get from that to saying that a question dealing with another class entirely is a duplicate? — Yam Marcovic, Jan 19 '16 at 14:45
@YamMarcovic (This will be my last comment: see end of my previous comment.) Because it answers the "how do I just get the headers on a HTTP request" question. Just because the Q has a starting point for a solution does not mean it is the only starting point. Yes `HttpCliient` is a possible approach; but so is `HttpWebRequest`. (Don't forget there are many Q's on [SO] failing into the "X Y Problem".) — Richard, Jan 19 '16 at 14:50
I agree with Yam, as long as this question is "HttpClient - detect Content-Type", this is not a duplicate. I'm migrating from WebClient to HttpClient, so the redirection to the WebClient solution did not make any sense to me. — kilnan, Sep 09 '16 at 20:05

Mehrzad Chehraz · Answer 1 · 2016-01-19T13:47:32.660

Since not all servers respond to the HEAD request as expected, you can also use this overload of GetAsync method so that the method returns immediately after headers are received if you use HttpCompletionOption.ResponseHeadersRead as second argument.

An HTTP completion option value that indicates when the operation should be considered completed.

ResponseHeadersRead from MSDN:

The operation should complete as soon as a response is available and headers are read. The content is not read yet.

Then you can dispose the client if you need to.

// Send request to get headers
 response = await client.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead);

// Check status code
if (!response.IsSuccessStatusCode) {
  // Error...
}

// Get Content Headers
HttpContentHeaderCollection contentHeaders = response.Content.Headers;


// Make decision and dispose client if you wish
if (...) {
   client.Dispose();
}

"Since not all servers respond to the HEAD request as expected" -- References? It's a big deal since you'll be needlessly using up bandwidth for both you and the server. — Yam Marcovic, Jan 19 '16 at 13:46

Yam Marcovic · Answer 2 · 2016-01-19T13:41:04.083

4

Now how about

var response = await client.SendAsync(
  new HttpRequestMessage(HttpMethod.Head, url)
);

edited Jan 19 '16 at 13:41

answered Jan 19 '16 at 13:35

Yam Marcovic

7,953
1
28
38

1

@Liam From RFC 2616, "9.4 HEAD The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response" – Yam Marcovic Jan 19 '16 at 13:38
1

So what your getting at is the `HttpMethod.Head` bit. You didn't explain that very well.... – Liam Jan 19 '16 at 13:39
@Liam It's a short and self-explanatory piece of code. – Yam Marcovic Jan 19 '16 at 13:43
1

Depending on the implementation on the web-server, it might not support `HEAD`. A fallback on `GET` is recommended. – mausworks Jan 19 '16 at 13:46
@diemaus Do you know of a common one that actually doesn't support HEAD? – Yam Marcovic Jan 19 '16 at 13:48
Some APIs I've worked with over the years have not been able to abide to the `HEAD` spec. – mausworks Jan 19 '16 at 13:57
@diemaus Interesting, though I wonder if the extra code is needed in this case. At any rate, an approach of using it as a fallback is much better than being the default, since that way you're giving both you and the server the opportunity to save wasted bandwidth. – Yam Marcovic Jan 19 '16 at 13:59
1

@YamMarcovic It certainly depends on the implementation. If it's implemented towards a web-service which is *known* to abide by the `HEAD`-spec, then by all means; a fallback is not needed. No bandwidth needs to be wasted here though, since you check the response before doing the `GET`-request. – mausworks Jan 19 '16 at 14:01

HttpClient - detect Content-Type

2 Answers2