0

I'm currently using a MySQL database, and the previous guy that maintained the database has changed the character set from ISO-8859-1 to UTF-8. Now there is a problem that every ä turns into ä. Now I've wrote code to change all of the records in the entire database. But apparently there are some words that are correctly written. So for example you have a word like Pöytäkrono and a word like Sisäänkirjautuminen. If I use iconv('UTF-8', 'ISO-8859-1', Pöytäkrono) it will give Pöytäkrono, but when I use iconv('UTF-8', 'ISO-8859-1', Sisäänkirjautuminen) it will give S. Because the database is quite big I want to do it automatically, but I don't want that the words that are correctly spelled/written to be changed only the ones that are wrong.

Basic
  • 26,321
  • 24
  • 115
  • 201
  • MySQL is only one of the actors. See [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Álvaro González Apr 01 '13 at 08:59
  • So you have *some* data in your database which has a screwed up encoding and you have *some* data which is correct? Or does [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) help? – deceze Apr 01 '13 at 08:59

1 Answers1

0

You can change database storage encoding just like that and it will work in that the database stores strings in UTF-8. This doesn't buy you anything by itself.

But things that also need to change:

  • Text editor encoding needs to be set to UTF-8. PHP strings directly in source code have the encoding your text editor has been set to.

  • The database<->php transport encoding, which probably doesn't even exist in your code because it defaults to ISO-8859-1. For UTF-8 you need to explicitly call mysql_set_charset("utf8") before making queries.

  • The website encoding declaration, also defaults to ISO-8859-1. You need to explicitly call header("Content-Type: text/html; charset=UTF-8") or configure for example apache to do it automatically.
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • but UTF-8 has the problem that it can't store or display ä or ö, I've solved it with a regex : $regex = "(([äöÄÖa-zA-Z0-9]*+\s*)+)"; this takes care of the values that are displayed correctly but still makes it possible to change the wrong values. – user1806834 Apr 02 '13 at 07:47