19

I'm using URL.openConnection() to download something from a server. The server says

Content-Type: text/plain; charset=utf-8

But connection.getContentEncoding() returns null. What up?

Bart van Heukelom
  • 43,244
  • 59
  • 186
  • 301
  • this related thread might help anyone else: http://stackoverflow.com/questions/9112259/obtaining-response-charset-of-response-to-get-or-post-request – Spoonface Dec 23 '12 at 10:39
  • Also there is a good reason connection.getContentEncoding() returns null: it returns the "Content-encoding" field of the http header, which **is not** supposed to give you a character set. It should be used for instance if the received data is compressed and gives you the way to use to transform the data so you can read it. https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11 – jdarthenay Mar 14 '16 at 06:25

3 Answers3

28

The value returned from URLConnection.getContentEncoding() returns the value from header Content-Encoding

Code from URLConnection.getContentEncoding()

/**
     * Returns the value of the <code>content-encoding</code> header field.
     *
     * @return  the content encoding of the resource that the URL references,
     *          or <code>null</code> if not known.
     * @see     java.net.URLConnection#getHeaderField(java.lang.String)
     */
    public String getContentEncoding() {
       return getHeaderField("content-encoding");
    }

Instead, rather do a connection.getContentType() to retrieve the Content-Type and retrieve the charset from the Content-Type. I've included a sample code on how to do this....

String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";

for (String value : values) {
    value = value.trim();

    if (value.toLowerCase().startsWith("charset=")) {
        charset = value.substring("charset=".length());
    }
}

if ("".equals(charset)) {
    charset = "UTF-8"; //Assumption
}
tronman
  • 9,862
  • 10
  • 46
  • 61
Buhake Sindi
  • 87,898
  • 29
  • 167
  • 228
  • These methods are overridden to return sane values in HttpURLConnection which the OP is most likely talking about, see http://goo.gl/wt0P – Waldheinz Oct 14 '10 at 14:46
  • the `substring()` argument should be `"charset=".length()+1` – bigstones Jul 26 '13 at 08:24
  • @bigstones, no. The `length()` method returns the length of the string. The `substring()` method starts sub stringing values from position 0 and not 1, so `length()` is position `length() - 1`). Your solution can cause a `NullPointerException` if there is no values after the `=` sign. – Buhake Sindi Jul 26 '13 at 08:37
  • ohh now I see what was wrong, I chained in the condition the `trim()` too, so when taking the substring I was doing that on the untrimmed one. Thank you and sorry for doubting! – bigstones Jul 26 '13 at 09:02
8

This is documented behaviour as the getContentEncoding() method is specified to return the contents of the Content-Encoding HTTP header, which is not set in your example. You could use the getContentType() method and parse the resulting String on your own, or possibly go for a more advanced HTTP client library like the one from Apache.

Community
  • 1
  • 1
Waldheinz
  • 10,399
  • 3
  • 31
  • 61
6

Just as an addition to the answer from @Buhake Sindi. If you are using Guava, instead of the manual parsing you can do:

MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();
Juan M. Rivero
  • 807
  • 13
  • 18