0

I have a screen where user can see an english word and type equivalent translation in any language.

My database creation query:

CREATE DATABASE IF NOT EXISTS lang_db
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;

My table creation query:

CREATE TABLE lang_map (
WORD         VARCHAR(2048) NULL,
DESCRIPTION     VARCHAR(2048) NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci; 

I am getting the word and description in json which I read in java and then fire a query to insert into table. But for languages like chinese or russian only thing that gets inserted is ?'s.

Mysql version: 5.5 Java: 1.6

update: Java code:

controller handling ajax call.

@ResponseBody
public setChanges(@RequestBody JSONObject keyValueMap) throws Exception {
    return myService.setChanges(keyValueMap);
}

service code

List<LangMapping> langMappings = new ArrayList<LangMapping>();
for(Object keyObject : changedKeyValueMap.keySet()){ 
    String key = String.valueOf(keyObject) ;
    String description = (String) changedKeyValueMap.get(key);
    langMappings.add(buildLangMapping(key,localeCode,description)); //pojo
}
//using above array list i am inserting into database table
pramodpxi
  • 977
  • 8
  • 13
  • How do you read in characters in java? I think you should also show the java code too. – Alex Oct 06 '17 at 05:41
  • I am directly inserting received string into database table, but thats not issue, even when I insert into table manualy question marks are only inserted and not accented characters. – pramodpxi Oct 06 '17 at 06:24
  • It sounds like your _connection_ was not "UTF-8". See "question marks" in [_this_](https://stackoverflow.com/questions/38363566/trouble-with-utf-8-characters-what-i-see-is-not-what-i-stored) for more debugging. – Rick James Oct 07 '17 at 23:21

2 Answers2

0

You will have to make sure that you are using the proper encoding at every stage of your application. The easiest way usually is to use the same encoding everywhere, in your case UTF-8.

To debug your issue, you could follow these steps:

  1. Using a good text editor like Notepad++, open some of the JSON files you get, check if they are properly encoded (i.e. if the Russian / Chinese / whatever characters are shown correctly), and check if the editor has auto-detected the encoding of the JSON file as UTF-8.

    If you don't have the source JSON data as files, but get it as response to some web request, then try to dump it into a file, using tools like wget or curl, and examine that file like described above.

  2. After having read the JSON input with JAVA, dump out the input from within JAVA before processing it further. I don't know JAVA, but it is a general issue that you have to tell your programming language / libraries / file open functions how the input is encoded.

    If you don't do that, JAVA will probably assume the JSON input to be in some default encoding; if this goes wrong, the data read in will be garbage.

    Likewise, before dumping out what you have read in, tell JAVA how the output should be encoded. If dumping out to the console, make sure that the console uses the encoding you expect as well.

    Please note that this is also true for web applications which use the CGI mechanism (as nearly all do). Standard input and standard output can be considered normal files in this context.

    Of course, if JAVA's default encoding for file and standard I/O is UTF-8 already, you can leave out this step.

  3. If you haven't found the problem yet, tell your database driver that you are sending UTF8-encoded data and that you want to get UTF-8 encoded data. Since I don't use JAVA, I don't know how to do this, but I am sure that it is described in the documentation.

  4. If it still does not work, tell MySQL that the connection and client uses UTF-8 (as far as your driver doesn't do that automatically after having implemented step 3). Use statements like SET SESSION character_set_x = 'utf8', where x stands for client, results or connection, respectively. You will have to do this each time immediately after having connected to the database (of course, again only as far as your driver does not do it automatically when connecting depending on its run-time or static configuration).

If you follow those steps, you will hopefully find the problem. Let us know how it goes.

Binarus
  • 4,005
  • 3
  • 25
  • 41
0

UTF does not save all the characters of other languages like russian, chinese, german etc. You can refer Differences between utf8 and latin1

Yogesh
  • 11
  • 3
  • No, it is partially wrong. MySQL's utf8 can handle all European characters and _most_ Asian characters. MySQL's utf8mb4 is the same as UTF-8, which chan handle everything. – Rick James Oct 07 '17 at 23:18