0

I have a form that needs to accept special font characters and write them to the database table. I believe the encoding is set correctly at the page/form level but when the field is written to the database the characters get changed to some other encoding. Other SO answers seem to indicate setting encoding to UTF-8 is the answer, which i've done.

Now, if I copy paste the characters below, direct to the database table, it holds them just fine as shown. Its only when I write it to the table from the form or when i retrieve it for display in web page.

Example characters: ⓄⒼקร

The web page is set as: <meta charset="utf-8">

The form tag includes attribute: accept-charset="UTF-8"

Php just before the INSERT has: $_POST['tag']=utf8_encode($_POST['tag']);

I have not had to write/encode those types of font/special characters before, so what am i doing wrong here?

DMSJax
  • 1,709
  • 4
  • 22
  • 35
  • How do you know it's getting changed? What client are you using to read it? What's the encoding on said client? – shmosel Jan 10 '17 at 23:21
  • If i copy/paste those characters above into form field and submit it, i get other ascii characters written to the table. But if i paste those same characters into table directly through phpMyAdmin then they write and display correctly in phpMyAdmin. If retrieved for display in web page through mysqli query, the browser gives me ascii or similar. – DMSJax Jan 10 '17 at 23:26
  • Sounds like "double encoding", or at least "Mojibake"; see those in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Jan 11 '17 at 02:25
  • @IvanBarayev - `COLLATION` refers to ordering, hence irrelevant. `CHARACTER SET` refers to encoding. – Rick James Jan 11 '17 at 02:27

1 Answers1

1

Do not use the PHP utf8_encode() or utf8_decode() functions.

Despite their promising-sounding names, what these functions actually do is mangle UTF8 text -- either by double-encoding UTF8 text, or by converting text to the ISO8859-1 encoding and replacing characters outside the Latin-1 range with question marks.

Remove the call to utf8_encode(), make sure your database table has the proper encoding (CHARACTER SET = utf8mb4), and you should be fine.

  • I changed coalition of database and all tables to be `utf8mb4_general_ci`, i left `` in place on the form submission page, i left the form tag as `
    ` and i commented out `//$_POST['tag']=utf8_encode($_POST['tag']);` the result is the table gets `â“Â` in the table
    – DMSJax Jan 10 '17 at 23:52
  • I notice on SO when i look at the 'ask a question' page structure - the page doesn't declare charset nor does the form include the accept-charset attribute, is there any reason why one or both those items can cause it to get mangled in encoding? – DMSJax Jan 11 '17 at 00:02
  • `â“` is "double-encoding" for `â`. So see my link is a previous Comment. And, again, do not use the en/decode functions. – Rick James Feb 20 '17 at 01:31