0

I'm developing a web app and trying to store some input data from a post request using asp.net core 3.1.

at first I should say that I'm using non-English language for inputs.

In local state everything goes fine, however when I publish it and try to store data in MySQL database, values are unexpected(in form of some question marks). the first thing came to my mind was maybe I used an inappropriate charset and encoding, so I change the encoding to the closest thing that I have in my local. didn't work. I even analyzed the HTTP request and there were no problems.
then I tried to insert directly in phpMyAdmin with insert method. worked fine.

local encoding: utf8mb4_0900_ai_ci
remote encoding: utf8mb4_unicode_ci

any help would be appreciated.

  • Is it the same mysql database instance or a new one? My guess is your problem is collation related. https://www.mysqltutorial.org/mysql-collation/ – Athanasios Kataras Nov 10 '20 at 12:50
  • I tried to use Migration, but it returns a unsupported encoding error. so I just create another one with the closest collation that existed. @AthanasiosKataras –  Nov 10 '20 at 12:56
  • 1
    There are two different types of question marks -- they indicate different problems. See https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Nov 10 '20 at 20:30
  • @RickJames thanks for your comment, but as I mentioned the problem is not with database charset and collation, since I can insert data directly without any problems. In the solution you mentioned, one probable case is ''The bytes to be stored are not encoded as utf-8 '', should I do something particular about it? –  Nov 11 '20 at 10:15

1 Answers1

0

The connection parameters determine the character set of the bytes in the client. To correctly store and retrieve data you must know the charset and announce it when connecting.

While transferring data between the client and server, MySQL will covert from that charset to the charset defined for the columns you are storing into / fetching from.

It sounds like you established the charset correctly for one client and incorrectly for the other.

If you can provide the hex of the characters while in the client, I can further explain.

utf8mb4_0900_ai_ci and utf8mb4_unicode_ci are different Collations for the same character set (encoding) utf8mb4. The characters will go between client and server without any transcoding. However, comparisons may cause trouble due to the different collations.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • d8 b7 in hex, has U+0637 Unicode code point, is one of them. seems fine when I check with utf-8 table. It is on the client and I get it from HTTP request in Wireshark. –  Nov 11 '20 at 21:08
  • Arabic TAH, correct? That is correct in utf8/utf8mb4. And I added to my Answer. – Rick James Nov 12 '20 at 00:46
  • It seems that you were right about collations. I just have to set the Collation and CharSet when I'm creating the table, instead of leaving them on default and change them afterwards. –  Nov 14 '20 at 16:17