1

I have a database containing the translations for several keywords.

I am accessing those translations and with many of the french words there are characters that don't appear in the English language. The accented e for example. (I will be later translating to other languages and I'm sure the same issue will come about.)

Keeping in mind that these translations are coming from a MySQL database, is there any way around this? I tried changing the charset from utf-8 to iso 8859-1 but that didn't make any difference.

Any guidance would be very warmly received! Thanks

jollinski
  • 15
  • 2

3 Answers3

0

Going by http://msdn.microsoft.com/en-us/library/ms186939.aspx, I would say that

VARCHAR(n) CHARSET ucs2 is the closest equivalent. But I don't see many people using this, more often people use:

VARCHAR(n) CHARSET utf8 As far as I am aware, both charactersets utf8 and ucs2 allow for the same characters, only the encoding is different. So either one should work, and I would probably go for the latter since it is used more often.

There are some limitations in MySQL's unicode support that may or may not apply to your use case. Please refer to http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html for more info on MySQL's unicode support.

Copied from: mysql equivalent data types

Community
  • 1
  • 1
Adam Plocher
  • 13,994
  • 6
  • 46
  • 79
  • This doesn't really matter, you can store any symbol in any type of field. What matters is the collation and how the words get into the database. They might come encoded with different encoding that utf8. – GTodorov Oct 01 '12 at 16:51
0

I would set client encoding to UTF-8 and Database collation utf8_bin. How do you exactly insert the words in your database? Do you translate the words automatically and insert them or do you do this by hand. You can also set your connection collation to UTF8.

-------- UPDATE -------

To enforce character set connection in your PHP, just add after the connection the following:

mysql_query('SET NAMES utf8');

If it doesn't work try enforcing character set to utf8 too in your script.

mysql_query('SET CHARACTER SET utf8');

And remember to clean all of your data and re-insert after the modifications, before you try to make any calls to the database.

GTodorov
  • 1,993
  • 21
  • 24
  • Hi, thanks. I insert them by hand with the correct accents. They appear ok in the database. I have done as you suggested. The accented Es for instance are still appearing as diamonds with question marks inside. Also, with utf8_bin, the database entries turn into a string of numbers and letters. – jollinski Oct 01 '12 at 17:05
  • If you convert the database from one collation to another you won't be able to read the data, you'll just get messed up symbols. There are converting techniques used to do that which doesn't always work 100% well. What you should do is create the database with the right collation and then insert the data. Make sure you have the right utf-8 encoding for the client. What are your my.ini settings for default-character-set, character_set_server and default-collation? Take a look the update above. – GTodorov Oct 01 '12 at 18:19
  • Hello. Thank you very much for that. The update suggestion worked a treat. I'm going to have a play around with different charsets though. At the moment, the use of strtoupper() doesn't work properly with the current sets. é doesn't become É, instead it stays as é. Thank you very much for the help. My default charset looks as though it is ISO-8859-1. – jollinski Oct 01 '12 at 22:29
0

Well ISO 8859-1 actually has an accented e in the charset. From what it sounds like, you have UTF8 chars in your DB, but wan't those pesky foriegn characters nicely made into pure English chars in the result. You can do this with coallation. Collate your results into something like latin1_swedish_ci.

I believe there's also an entry in vars that you can just set so you don't have to specify coallation in every query, but I'm too lazy to look it up :p

tazer84
  • 1,743
  • 12
  • 11
  • I have a table which contains a set of keywords, and a set of corresponding words in several languages, when I select french as the language, the accented Es are appearing as the old diamond with question marks. So I do want the accents. I know that it they are contained within the 8859 charset so thought it was strange that they still aren't being displayed correctly :S – jollinski Oct 01 '12 at 17:10