1

I encountered a problem of string encoding, actually I get from an Internet site this name: CENTRE ESPORTIU D'ALÀS and I'm trying to save this into my DB. Actually the database table encoding is setted to utf8_unicode_ci (using MySQL WorkBench).

When I store it in the database I'll get this: Centre Esportiu d'Al\u00e0s.

For create the connection I use this code:

MySqlConnectionStringBuilder conn = new MySqlConnectionStringBuilder();
conn.Server = "localhost";
conn.Database = "sample";
conn.UserID = "root";
conn.Password = "root";
conn.CharacterSet = "utf8";
conn.Port = json.Port;

MySqlConnection connection = new MySqlConnection(conn.ToString());
connection.Open();

I store the value in the following way:

using (MySqlConnection connection = new DBConnection().Connect())
{
    using (MySqlCommand command = new MySqlCommand())
    {
        command.Connection = connection;
        command.CommandText = "INSERT INTO venue (name) VALUES (@name)";
        command.Parameters.AddWithValue("@name", "CENTRE ESPORTIU D'ALÀS");
        command.ExecuteNonQuery();
    }
}

As I said in the DB I get: Centre Esportiu d'Al\u00e0s. Any idea?

Thanks in advance for any help.

Jidic
  • 147
  • 1
  • 8
  • https://stackoverflow.com/questions/1922199/c-sharp-convert-string-from-utf-8-to-iso-8859-1-latin1-h – Raj Jul 13 '18 at 08:38
  • @RajKamal tried already didn't worked for me – Jidic Jul 13 '18 at 08:56
  • I believe that if you define the columns in your database as `utf8mb4`, your problems will go away. – Ian Kemp Jul 13 '18 at 09:09
  • @RajKamal - That link exhibited "Mojibake", one of several ways characters can get mangled. ``\u00e0` is a unicode codepoint; fine for HTML, but not advisable for databases. – Rick James Jul 13 '18 at 18:21

1 Answers1

0

Supposing your string come from a txt file Open it on notepad++ and at the bottom right you can see the file encode. Doing that you can use the code below to "read" your string correctly:

Encoding enc = Encoding.GetEncoding("ENCODE_FROM_NOTEPAD++");
byte[] enc_bytes = Encoding.Convert(Encoding.UTF8, enc, Encoding.UTF8.GetBytes(your_str_initial_value));
string msg = enc.GetString(enc_bytes);

EDIT: I'm not sure you can solve your problem with that solution but try to edit your connection string like below:

Driver={MySQL ODBC 5.2 ANSI Driver}; //5.2 is your MySQL version

doing that you use an ansi driver

Legion
  • 760
  • 6
  • 23
  • I miss that part "Actually the database table encoding is setted to "utf8_unicode_ci" the answer is that your string is not in "utf8_unicode_ci" format ant it will be casted when you wite it in DB, i think you need to change your db encode or re-encode the string in the correct format when you read it from the DB. – Legion Jul 13 '18 at 09:01
  • which encode do you suggest? I searched `what is the best db encoding?` and the most voted answer say `utf8_unicode_ci` – Jidic Jul 13 '18 at 09:04
  • depend of the encode of your source file i think... not exist an answer that works in all cases... try to tell us the encoding you see in notepad++ and we can try to answer... i edit my answer with some more usefull stuff ;) – Legion Jul 13 '18 at 09:07
  • 1
    @Jidic - `utf8_unicode_ci` is a `COLLATION` -- for ordering. It does imply `CHARACTER SET utf8`, which is an 'encoding', but only the 3-byte characters of `UTF-8`. To get Emoji and all of Chinese use `CHARACTER SET utf8mb4`. – Rick James Jul 13 '18 at 18:18