0

I am trying to save some Indian language content (read Hindi) from a website into a column in MySQL database. I am using SpringBoot and JPA to scrape the website and write the content.

Everything works fine on my local system (OSX). The same code is deployed on Ubuntu 16.04 LTS. It works fine except that the content of the relevant db column shows '????'. I presumed that it might be a problem with the COLLATION and CHARACTER SET of MySQL of the prod instance.

Indeed MYSQL version on my local machine was 8.x. The COLLATION on local db was utf8mb4_0900_ai_ci. The version on prod was 5.6, the COLLATION being utf8_x_x. Apparently, utf8mb4_0900_ai_ci, is not available for versions prior to 8.x.

I tried changing the COLLATION of columns (the entire schema for that matter) on local machine to utf8mb4_general_ci, as well as utf8mb4_unicode_ci. Both works fine on local DB. But the same collation, namely utf8mb4_general_ci & utf8mb4_unicode_ci still gives '?????' in the prod environment!

Could this be a problem other than 'COLLATION'? I am able to print the Hindi content on terminal, both on local and Ubuntu deployment. The problem is less likely with the client, as the same client (Datagrip) shows local schema content nicely and ubuntu DB content as cryptic question marks.

Have googled for two days. Any help would be appreciated.

Abhishek Prabhat
  • 917
  • 1
  • 6
  • 15
  • Is your client connecting with the same driver? Have you verified your DB data using different clients to isolate the problem? – Siraj K May 29 '18 at 12:17
  • Collation, not relevant to "????". Character set, yes -- but there are several places. See "question mark" in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James May 29 '18 at 17:42
  • @Rick I believe that the character set gets locked by the collation i.e. collation utf8mb4_x => character set is utf8mb4. The column (and the table) has the correct character set - utf8mb4. Couldn't get the content to become properly visible despite 'Set Name ...'. Resorted to brute force. Updated db to 8.x. Got character set as utf8mb4_0900_ai_ci. Things working. Thanks for help. – Abhishek Prabhat Jun 02 '18 at 07:57
  • @AbhishekPrabhat - They bytes in the client and the connection parameters are also improtant. – Rick James Jun 02 '18 at 14:48
  • For others encounter this issue, you can try this (worked for me): On the database: ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; - It will solve the” from now on “ created tables. NOT for EXIST tables. For them you need to do : ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; Source - https://www.digitalocean.com/community/questions/any-way-to-disable-charset-handshake-and-change-default-server-charset-in-managed-mysql – lingar Jun 17 '20 at 11:37

0 Answers0