0

I have a server running MySQL (tomcat, java spring) and recently upgraded the entire database to utf8mb4.

However, I'm now noticing characters we've likely copied from Word are causing issues - like "smart quotes" in the data are showing as ’ in the app's UI. Similar issue seen here.

Is there a function I can create in MySQL to just replace these Windows 1252-specific characters with UTF-8 equivalents moving forward? Or another solution to do this dynamically?

The goal is to avoid the issue for all users who try to save information copied from MS Word and other similar sources in the future. Such data may come from editing in the UI, bulk imports, etc. so I'd like the adjustment in the back-end to apply to all of these use cases where data is inserted into the db.

--

To fix the immediate issue, I was able to run a couple of queries to change the left and right smart quotes similar to this: (source)

UPDATE artist SET artist_blurb = REPLACE(artist_blurb, ”’, ”’)
    WHERE artist_blurb LIKE ‘%’%’;

SHOW VARIABLES LIKE 'char%'; result is: | Variable_name | Value |
+--------------------------+----------------------------+ 
| character_set_client | utf8 | 
| character_set_connection | utf8 | 
| character_set_database | utf8mb4 | 
| character_set_filesystem | binary | 
| character_set_results | utf8 | 
| character_set_server | latin1 | 
| character_set_system | utf8 |
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • `’` is Mojibake. How did you "convert" the database? Perhaps the wrong way. Please provide the queries used for such, plus `SHOW VARIABLES LIKE 'char%';`, `SHOW CREATE TABLE`. And do `SELECT HEX(col) ...` so we can see what is currently stored. – Rick James Jan 27 '21 at 00:16
  • ’ does not show in the database. The " is what shows in the database itself. Everything was converted properly via dropping and recreating. The problem is that smart quote symbol is from Microsoft and not UTF-8. – Kevin Malone Jan 29 '21 at 18:54
  • SHOW VARIABLES LIKE 'char%'; result is: | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | – Kevin Malone Jan 29 '21 at 18:56
  • Use hex() to see what is really in the column. – Rick James Jan 29 '21 at 20:16
  • HEX for left quote: E2809C & for right quote: E2809D – Kevin Malone Feb 01 '21 at 17:32
  • If you are seeing hex E2809C` in the table, but jibberish when displayed, then you have "Mojibake"; see https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Feb 01 '21 at 17:42
  • Thanks Rick. I've done everything listed there as solutions so I'm thinking the answer to my original question - if this can be prevented when copying text from tools like Word into the db - is "no". – Kevin Malone Feb 02 '21 at 19:50
  • It should be possible to copy and paste. But there may be about 5 places where utf8 must be configured. Probably only 1 is incorrectly set now. – Rick James Feb 02 '21 at 20:28

0 Answers0