16

I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.

The string that I retrieve from it have strange characters when ë like characters are expected.

For example: "ë" is "ë".

I thought I could get this right by converting from UTF8 to UTF16.

 return Encoding.Unicode.GetString(                
            Encoding.Convert(
            Encoding.UTF8,
            Encoding.Unicode,
            Encoding.UTF8.GetBytes(utf8)));
    }

This however doesn't change a thing.

How could I get the data from this database in proper form?

Peter
  • 511
  • 3
  • 5
  • 11

6 Answers6

38

There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector):

  1. Ensure that the collation of your database of table is a UTF-8 collation (i.e. utf8_general_ci or one of its relations)
  2. Add Charset=utf8; to your connection string.

    "Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
    

I'm not certain, but the encoding may be case sensitive; I found that CharSet=UTF8; did not work for me.

satnhak
  • 9,407
  • 5
  • 63
  • 81
3

Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:

  1. Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
  2. Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
  3. Whenever you select based on Unicode data ensure you use the N prefix again

MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");

MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");

If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set

RobV
  • 28,022
  • 11
  • 77
  • 119
  • I'm afraid that that downcast issue is the case. The tables are in UTF8, but the fields are just VARCHAR. Is there no way to "upcast" it to utf8 or whatever coding to get the data back? – Peter Jun 03 '09 at 12:07
  • Strangely, the Ruby on Rails app that uses the database doesn't seem to have any problems with getting the data correctly – Peter Jun 03 '09 at 12:21
1

Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.

erenon
  • 18,838
  • 2
  • 61
  • 93
1

As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.

What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "ë" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of à and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.

JJJ
  • 509
  • 1
  • 6
  • 14
0

thank you The Mouth of a Cow , your solution works but still we need converting characters. i think this is your problem :) and for converting characters you can use this code

 System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;

 string s = "unicode";

 //string to utf
 byte[] utf = System.Text.Encoding.UTF8.GetBytes(s);

 //utf to string
 string s2= System.Text.Encoding.UTF8.GetString(utf);
Farhan
  • 96
  • 2
  • 11
0
"Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"

It worked - PowerShell 7.2, MySQL Connector 8.0.29

Suraj Rao
  • 29,388
  • 11
  • 94
  • 103