JSON Jackson + HTTPClient with german umlauts

Question

I'm having a problem regarding a json string, i acquire with the Apache http client, containing german umlauts.

The mapping of json strings is only working, if the string does not contain any german umlaut, otherwise i get an "JsonMappingException: Can not deserialize instance of [...] out of START_ARRAY.

The Apache http client is set with "Accept-Charset" to HTTP.UTF-8, but as result i always get e.g. "\u00fc" instead "ü". When i manually replace e.g. "\u00fc" with "ü" the mapping works perfect.

How can i get a utf-8 encoded json response from Apache http client? Or is the server output the problem?

params.setParameter(HttpProtocolParams.USE_EXPECT_CONTINUE, false);
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);
httpclient = new DefaultHttpClient(params);
httpclient = new DefaultHttpClient(params);
HttpGet httpGetContentLoad = new HttpGet(url);
httpGetContentLoad.setHeader("Accept-Charset", "utf-8");
httpGetContentLoad.setParams(params);
response = httpclient.execute(httpGetContentLoad);
entity = response.getEntity();
String loadedContent = null;
if (entity != null)
{
   loadedContent = EntityUtils.toString(entity, HTTP.UTF_8);
   entity.consumeContent();
}
if (HttpStatus.SC_OK != response.getStatusLine().getStatusCode())
{
    throw new Exception("Loading content failed");
}
closeConnection();
return loadedContent;

And the json code is mapped here:

String jsonMetaData = loadGetRequestContent(getLatestEditionUrl(newspaperEdition));
Newspaper loadedNewspaper = mapper.readValue(jsonMetaData, Newspaper.class);
loadedNewspaper.setEdition(newspaperEdition);

Update 1: JsonMetaData is type of String containing the fetched json code.

Update2:

This code i use to transform the json output to me needs:

public static String convertJsonLatestEditionMeta(String jsonCode)
{
    jsonCode = jsonCode.replaceFirst("\\[\"[A-Za-z0-9-[:blank:]]+\",\\{", "{\"edition\":\"an-a1\",");
    jsonCode = jsonCode.replaceFirst("\"pages\":\\{", "\"pages\":\\[");
    jsonCode = Helper.replaceLast(jsonCode, "}}}]", "}]}");
    jsonCode = jsonCode.replaceAll("\"[\\d]*\"\\:\\{\"", "\\{\"");
    return jsonCode;
}

Update3: Json conversion example:

jsoncode before conversion:

["Newspaper title",
{
    "date":"20130103",
"pages":
            {
            "1":  {"ressort":"ressorttitle1","pdfpfad":"pathToPdf1","number":1,"size":281506},
            "2":{"ressort":"ressorttitle2","pdfpfad":"pathToPdf2","number":2,"size":281533},
            [...]
        }
    }
]

Jsoncode after conversion:

{   
"edition":"Newspaper title",
"date":"20130103",
    "pages":
    [
       {"ressort":"Resorttitle1","pdfpfad":"pathToPdf1","number":1,"size":281506},
       {"ressort":"Resorttitle2","pdfpfad":"pathToPdf2","number":2,"size":281533},
       [...]
    ]
}

Solution: I started using GSON as @Boris suggested and the problem regarding umlauts is gone! Further more GSON really seems to be faster than Jackson Json.

A workaround would be to replace the characters manually following this table:

Sign        Unicode representation

Ä, ä        \u00c4, \u00e4
Ö, ö        \u00d6, \u00f6
Ü, ü        \u00dc, \u00fc
ß           \u00df
€           \u20ac

By curiosity, if you try and read the tree (using `mapper.readTree`) to a `JsonNode`, does it work or do you also have an error? If an error, which one? — fge, Jan 10 '13 at 01:55
Also, `.readValue()` has a _lot_ of overloads -- what argument type is `jsonMetaData` here? — fge, Jan 10 '13 at 01:58
Hi, thanks for your quick response. Using mapper.Tree to JsonNode throws the same Exception. JsonMetaData is type String. Can the problem be related to the server, because its not able to return utf-8 encoded jsondata? — alex, Jan 10 '13 at 09:01

Boris Strandjev · Accepted Answer · 2013-01-10T13:31:59.800

2

Try parsing like that:

entity = response.getEntity();
Newspaper loadedNewspaper=mapper.readValue(entity.getContent(), Newspaper.class);

No reason to go through String, Jackson parses InputStreams directly. Also Jackson will automatically detect the encoding if you use my proposed approach.

EDIT By the way consider using GSON JSON parsing library. It is even faster than Jackson and easier to use. However, Jackson recently started parsing XMl, too, which is a virtue.

EDIT2 After all you have added as details I would suppose the problem is with the server implementation of the services - the umlauts are not to be unicode escaped in the json - UTF 8 is native encoding for it. Why don't you instead of manually replace e.g. "\u00fc" with "ü" do it via regex?

edited Jan 10 '13 at 13:31

answered Jan 10 '13 at 09:28

Boris Strandjev

46,145
15
108
135

Using your proposed approach is not possible for me, because before mapping the jsonstring, it gets converted via string replace to fit my needs for object mapping. its necessary because of a poor api on server side. But EntityUtils.toString(entity, HTTP.UTF_8); should return a proper encoded string for Json Jackson, doesn't it? GSON Json would be fine for me, but i'm curious why its not working in the current configuration. – alex Jan 10 '13 at 10:02
@alex Whilst I accept your explanation, I would like to ask you what kind of manipulations do you do on the string? I am asking, because I have the god feeling I would be able to help you do them even when using my proposed approach. I have been many times in situation similar to yours. Also, I still think that going to `String` is a wrong way to go. – Boris Strandjev Jan 10 '13 at 10:06
Ok, sound s great. I do a transformation to compensate the api's poor understanding of object orientation to map it to my java objects with the function attached to my post above. – alex Jan 10 '13 at 10:12
@alex Thanks, can I ask you to also post short examples of what the regexes do. I suppose you can imagine, that even though I can try understanding them, they are not so easy to comprehend. I believe example will help me with that. – Boris Strandjev Jan 10 '13 at 10:18
@alex Dude, I do not know who thought of those services, but this is not json service at all! – Boris Strandjev Jan 10 '13 at 12:01
I know, so i wrote a conversion wrapper. The api is used as as web app for ipads fromerly, that explains the bad / non existing object structure. After my conversion, i have valid json code, indeed. If you put the output before conversion in javascript eval, you get an associative array with key "1","2" ..., what seems to be very unusual. But the API is given, i can't change... – alex Jan 10 '13 at 12:56
@BorisStrandjev Thank you for the good answer! I am curious, however, as to why you think Gson would be faster? All numbers I have seen suggest the opposite. – StaxMan Jan 10 '13 at 17:44

JSON Jackson + HTTPClient with german umlauts

1 Answers1

Linked