0

I started working on a wordpress on my dev machine. mysql version is 5.6, and worpdress is 4.7 so its already using the utf8mb4_unicode_520_ci encoding if it detects its possible.

My problem is that on my hosting (mysql 5.5) utf8mb4_unicode_520_ci is not recognized as a valid encoding. So I'm trying to target utf8mb4_unicode_ci encoding as my hosting knows about this one, and if I understand correctly, this would - in opposition to going to utf8 - allow me to keep the 4 bytes.

I tried several different combinaison of encoding and collation set up for the db, but nothing successful (from here How to convert an entire MySQL database characterset and collation to UTF-8?).

I tried several combination of encoding and collation in the wp-config, but nothing.

Everything that is coming from the database (like post titles and post contents displays badly encoded char for all diatrics, anything else is displayed appropriately )

menu label from the database display incorrectly, where the hardcoded/translated label display correctly

I think I need to convert the actual content of the database, changing charset and collation does not seems to be enough.

I found this but it does not address my problem directly, or if it does I missed it.

Any help would be appreciated

————————————————————————————————

UPDATE :

here is the precise procedure I went through:

Initial situation:

I installed a wordpress (4.6.1) locally (on my dev machine, mysql 5.6.28). I worked on the theme and plugin locally

(at this moment I have, locally, a database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci

Problem:

I want to deploy my wordpress on my hosting (mysql: 5.5 - db collation seems to be utf8mb4_unicode_ci). I mysqldump the db locally, then try to import it on my hostings' phpmyadmin. This gives error :

Unknown collation: 'utf8mb4_unicode_520_ci'

solution 1 change the tables charset to utf8mb4_unicode_ci:

On my hosting sql server, utf8mb4_unicode_520_ci is not available and I can't get a more recent version of mysql.

utf8mb4_unicode_ci seems like the closest and is available on my hosting sql server.

from various so question, I adapt a bash script to change charset and collation of my tables

for tbl in wp_sij2017_commentmeta wp_sij2017_comments wp_sij2017_cwa wp_sij2017_links wp_sij2017_options wp_sij2017_postmeta wp_sij2017_posts wp_sij2017_term_relationships wp_sij2017_term_taxonomy wp_sij2017_termmeta wp_sij2017_terms wp_sij2017_usermeta wp_sij2017_users wp_sij2017_woocommerce_api_keys wp_sij2017_woocommerce_attribute_taxonomies wp_sij2017_woocommerce_downloadable_product_permissions wp_sij2017_woocommerce_order_itemmeta wp_sij2017_woocommerce_order_items wp_sij2017_woocommerce_payment_tokenmeta wp_sij2017_woocommerce_payment_tokens wp_sij2017_woocommerce_sessions wp_sij2017_woocommerce_shipping_zone_locations wp_sij2017_woocommerce_shipping_zone_methods wp_sij2017_woocommerce_shipping_zones wp_sij2017_woocommerce_tax_rate_locations wp_sij2017_woocommerce_tax_rates; do
 mysql --execute="ALTER TABLE wp_sij_2017_original_copy.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
done

I run this script on the local db I now have all my tables set to collation utf8mb4_unicode_ci

My db collation is still utf8

I mysqldump the db, then import it to my hosting and...

Import is successful.

I search and replace siteurl in the db. I then visit the online website, I got SOME diatrics that renders a "question mark char"

Any text coming from the db has decoding issue AT SOME POINT

The source/html markup also has those "question mark char"

I have no idea where to look or what to do next

Community
  • 1
  • 1
Denis Florkin
  • 33
  • 1
  • 8
  • Your last link is broken. What are you actually doing? Exporting from one system and importing in the other? If so what tools to you use to do so? Wordpress does automatic collation/charset changes/upgrades. As for your example, did you verify what kind of characters are in the source of the page and what encoding is being used? – Seth Jan 25 '17 at 11:55
  • Hi @Seth, I updated the last link (which is about mysql CONVERT/CAST function). What I'm trying to do is bring my wordpress from local dev to online. How I'm doing it: I mysqldump the database in my local machine terminal and then import reuslting file through phpmyadmin and then proceed to do a searchAndreplace for the siteurl value. The problem is that the encoding on my local machine if not recognize by my hosting sql server. – Denis Florkin Jan 25 '17 at 12:28
  • @seth I'm not sure what you mean by `the source`. The characters in the database are correctly formated when consulted from phpmyadmin, wether on my local machine or on the hosting server. In the html that is served by my wp on my hosting all the diatrics are the question mark character. The charset meta tag in my html is utf8. – Denis Florkin Jan 25 '17 at 12:31
  • Exporting in compatibility mode (either mysqldump option `--compatible=mysql4` or the dropdown value `mysql40` in phpmyadmin) does not solve the problem ( from this question: http://stackoverflow.com/questions/29916610/1273-unknown-collation-utf8mb4-unicode-ci-cpanel/29939906#29939906 ) – Denis Florkin Jan 25 '17 at 13:45
  • Source as in Sourcecode. If it would display fine in the source but not on the display you could assume that something funny is going on in regards to the HTML meta tag encoding and the actual file encoding. Especially if the text is displaying in phpMyAdmin on your target system I'd rather assume that there is some kind of error while reading from the DB and handling the string using PHP. – Seth Jan 25 '17 at 13:57

2 Answers2

0

Clarification: CHARACTER SETs utf8 and utf8mb4 specify how characters are encoded into bytes. COLLATIONs *_unicode_*, etc, specify how those character compare.

The encoding for utf8mb4_unicode_ci and utf8mb4_unicode_520_ci are the same because they are encoded in the character set utf8mb4.

"database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci" -- that probably means that new tables in that database, unless specifically stated, will be CHARACTER SET utf8 COLLATION utf8_general_ci. That is the database setting is just a default for CREATE TABLE. Since your tables are already CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci, the database default is not relevant to them.

As long as the CHARACTER SET stays utf8mb4, no Emoji, Chinese, etc will be lost or otherwise mangled.

Do not use mysql40; it did not know about any CHARACTER SETs. Do not use CONVERT or CAST. Etc.

I assume the 520 is coming from the output of mysqldump? Do you have an editor that can handle a file that big? If so, simply edit it to change utf8mb4_unicode_520_ci to utf8mb4_unicode_ci throughout. Then load the dump. Problem solved?

Your fix

You did ALTER ... CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci on your local machine. That is probably an even better way -- since it will put your dev and prod machine in line with each other. That should have worked. Don't worry about what the "database" claims.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hi @Rick, thanks for the clarification on charset and collation, it ended up helping :). I did as you suggested (search and replace the text in the dump) but this wasn't enough (I'd already tried that). BUT What made it work was setting the charset in my wp-config to utf8 and the collation to nothing (so 'default' I guess - I don't know how wp handle that). ``` define('DB_CHARSET', 'utf8'); // define('DB_COLLATE', 'utf8mb4_unicode_ci'); ``` – Denis Florkin Jan 26 '17 at 08:13
  • I just tried uncommenting the define('DB_COLLATE','utf8mb4_unicode_ci'), and it does not seems to cause issue. I'm still pretty sure I'm missing some understanding of what went/is going on. But anyway this is now working. Thought I'm not sure if I should answer my own question as I don't really understand why the solution works today... And @rick's answer seems like the closest to the 'right' answer, but I'm sure I tried this yesterday, and it wasn't working then, so maybe there's something else at play here and I missed it. – Denis Florkin Jan 26 '17 at 08:29
0

I'm find 'utf8mb4_unicode_520_ci' and replace with 'utf8mb4_unicode_ci' in .sql file. Its simplest why to solve this.

Ali Sufyan
  • 75
  • 1
  • 8