German characters encoding problem when data are loaded from database

Question

I'm creating a unit test that converts the object to JSON but I'm having some problems with some special characters:

String expectedResponse = gson.toJson(callReasonRepository.findAll());

I'm getting this: VerfÃ¼gung instead of Verfügung.

I have my project set to UTF-8. Any idea why this happens? Anything that I can provide more?

Does this answer your question: https://stackoverflow.com/questions/3995559/json-character-encoding ? — Nikolay Shindarov, Nov 29 '19 at 17:05
Not really because I'm just calling the repository nothing else. — akv84882, Nov 29 '19 at 17:07
The issue is typically related to mixed encodings. Your program uses UTF-8 encoding? Did not know you can set an encoding on project level. You can set encodings for source files and resources. How/Where did you set an encoding? Is your `callReasonRepository` working correctly? There might be issues with saving UTF-8 encoded strings to a database using a different encoding. Where do you see the wrong characters? In the debugger? On the command line? In a log file? What encoding is used there (default system encoding? log file encoding?). — Jochen Reinhardt, Nov 29 '19 at 17:17

score 0 · Answer 1 · answered Nov 29 '19 at 17:50

I cannot be sure because I can only see what you show and not the underlying bytes, but I would say that you have just a display problem.

The german 'ü' character is unicode U+OOFC which gives in UTF-8 the byte pair b'\xc3\xbc'. But if your display expects Latin1 or its Windows variant cp1252, \xc3 is the code for 'Ã' and \xbc the one for '¼'. To make sure, you should try to write that directly into a file and use an hexa capable editor (like the excellent vim) to control the actual bytes content.

Said differently, the code does what you have asked but the displaying program does not observe the UTF-8 encoding, either because it cannot or because you forgot to declare that.

Michał Ziober · Answer 2 · 2019-11-29T21:49:37.087

This problem appears when data are encoded with UTF_8 charset but you are using windows-1252 (or ISO-8859-1) to read them. I created JSON file encoded in UTF-8:

{
  "value": "Verfügung"
}

And read it as a Map using below code:

import com.google.gson.Gson;
import com.google.gson.GsonBuilder;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Map;

public class GsonApp {

    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        Gson gson = new GsonBuilder().create();

        Charset outputEncoding = Charset.forName("windows-1252");

        try (InputStreamReader reader = new InputStreamReader(new FileInputStream(jsonFile), outputEncoding)) {
            Map map = gson.fromJson(reader, Map.class);
            System.out.println(map);
        }

        byte[] bytes = "Verfügung".getBytes(StandardCharsets.UTF_8);
        System.out.println(new String(bytes, outputEncoding));
    }
}

Above app prints:

{value=VerfÃ¼gung}
VerfÃ¼gung

GsonApp file is also encoded in UTF-8.

I guess, in your case you read UTF-8 encoded data with default system charset which is probably windows-1252. You load data from DB so you probably need to set explicitly encoding to UTF-8 in connection string. See example for MySQL database: JDBC character encoding.

German characters encoding problem when data are loaded from database

2 Answers2