0

I'm writing a simple insert-select data "hello world" application (I'm new to Java but try to help my 14-y.o. son with his first project) and get an issue: non-ASCII (Russian) strings are saved to MySQL table in a wrong encoding. All right, I have already checked:

  • Schema and table colation: utf8_general_ci
  • Code file encoding is UTF-8 (written in VS Code)

I'm using official MySQL Connector/J from Oracle website. The code itself:

public static void main(String[] args) {
        Connection conn = null;
        PreparedStatement stmt1 = null;

        try {
            Class.forName("com.mysql.cj.jdbc.Driver").newInstance();
        } catch (Exception ex) {
            System.out.println("Error getting newInstance()" + ex.getMessage());
            return;
        }

        try {
            conn = DriverManager.getConnection("jdbc:mysql://demo.server.ru/project1?user=...&password=...&characterEncoding=utf8&useUnicode=true");

            stmt1 = conn.prepareStatement("INSERT INTO Pers1 (FirstName, LastName, Phone) VALUES (?, ?, ?)");

            stmt1.setString(1, "Иван");
            stmt1.setString(2, "Ромашкин");
            stmt1.setString(3, "+79115544788");

            stmt1.executeUpdate();
            stmt1.Close();
        } catch (SQLException ex) {
            // handle the error
            System.out.println("MySQL error: " + ex.getMessage());
            System.out.println("SQLState: " + ex.getSQLState());
            System.out.println("VendorError: " + ex.getErrorCode());
        } 
    }
}

I have also tried to encode string data to UTF-8 (well, it must be in UTF-8 from the very beginning...). But I still find something like Антуан in the table! Please tell me what is wrong with all these stuff?

Igor Pigin
  • 81
  • 7
  • How certain are you that the literals in your source code are being correctly compiled? I'd start with that - print out each UTF-16 code unit from one of them, as integers, and check that. – Jon Skeet Oct 27 '21 at 12:13
  • Are there different ways to compile literals in Java? 8-\ As I told you I'm new to it so I compiled it just with "javac myClass.java" – Igor Pigin Oct 27 '21 at 12:29
  • 1
    That's precisely why I'm suggesting that you log the UTF-16 chars (e.g. using `charAt` to get at each char in turn, then cast to an `int` and log that). You can specify the encoding that `javac` uses with `java -encoding utf8 Test.java` – Jon Skeet Oct 27 '21 at 13:12
  • Jon, thank you very much! "java -encoding utf8 Test.java" solved the problem! You really saved my brain from blowing today... – Igor Pigin Oct 27 '21 at 13:27
  • Does `Антуан` look better? When "double-encoded" that comes out as `Антуан` – Rick James Nov 11 '21 at 03:25
  • Note that `Р` is the Cyrillic "ER", not the Latin `P`. – Rick James Nov 11 '21 at 03:28
  • See "double encoding" (and maybe other things) in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Nov 11 '21 at 03:29

2 Answers2

0

utf8_general_ci

That's not UTF8. MySQL is a lying liar that lies.

utf8mb4 is actual UTF_8. utf8 is short for utf8mb3 which is a ridiculous name, as that is not UTF_8.

Address that issue. Once that's fixed, your code is no longer the problem. If you're seeing 'weird' characters, whatever pipeline you have that takes data from MySQL and shows it on screen has a problem, not your java code.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • Thanks for an idea but it didn't work for me :-( I have dropped and created new schema and tables. They have "ENGINE=InnoDB DEFAULT CHARSET=utf8mb4" now. But the data inserted from code above still look corrupted... – Igor Pigin Oct 27 '21 at 12:49
  • But... Cyrillic is fully handled by utf8mb3 (aka MySQL's utf8). So this answer, though valid, is not relevant to the question. – Rick James Nov 11 '21 at 03:26
0

Thanks to Jon Skeet, the right answer is this: the Java code containing non-ASCII strings should be compiled with extra flag encoding :

java -encoding utf8 Test.java
Igor Pigin
  • 81
  • 7