1

I coded a php project under ISO 8859-1, and for some technical reasons I want to encode the project under UTF-8. what is a better way to do it? I am afraid of loosing special characters like french accents and so on. thanks for you advice.

soulmerge
  • 73,842
  • 19
  • 118
  • 155
P.M
  • 2,880
  • 3
  • 43
  • 53
  • this might help: http://stackoverflow.com/questions/910793/php-detect-encoding-and-make-everything-utf-8 – Aziz Nov 30 '09 at 23:19
  • I assume that you're talking about saving PHP source code files as UTF-8 instead of ISO-8859-1: Have you tested it anyway? ISO-8859-1 characters falls in the same UTF-8 range as well (but not vice versa). If so, what problems exactly did you have when converting? – BalusC Nov 30 '09 at 23:20
  • @BalusC That is not entirely true. They both have a common subset, known as ascii, but half of iso-8859-1 is encoded different in utf-8. – troelskn Dec 01 '09 at 00:01
  • True, but this does not apply if you use a tool which can open them as ISO-8859-1 and save them as UTF-8. The other way round isn't possible. The average text editor/IDE can perfectly do that. – BalusC Dec 01 '09 at 00:49
  • thanks Guys for advices. I changed the encoding from the editor, and copied all of my old files into new files. this is odd, but I didn't want to write extra code line to decode/encode. – P.M Dec 01 '09 at 18:18

3 Answers3

1

transcode all the files with iconv. change any and all http headers or meta tags. profit.

just somebody
  • 18,602
  • 6
  • 51
  • 60
1

You should try using the shell command iconv to encode the php files from latin1 (ISO-8859-1) to UTF-8.

After that you should be sure that PHP uses UTF-8 as the default encoding (default_encoding variable in php.ini if I recall correctly). If not, then you can set it with ini_set() for your project.

After that you should convert your database to UTF-8 or use a quickfix like this (for MySQL):

mysql_query("SET NAMES 'utf8'");

Of course you just substitute mysql_query() for whatever framework you use (if you use any). Put it into your primary file which includes all the classes and stuff.

1

Here's my take on your question - you want the generated HTML (via PHP) to be UTF-8 compliant? Be aware that the HTML 4.x standard is based on iso-8859-1 and it's unclear if XHTML is based on utf-8 or iso-8859-1. Of course, pure XML is utf-8.

(1) So the first piece of the puzzle is to select your DOCTYPE for your rendered HTML.

(2) Make sure you add the the language character set meta tags (charset=utf8), etc.

(3) Take the rendered PHP/HTML string and send it through iconv either via the shell using a system call or through some PHP API method.

The resulting rendered HTML will be utf-8 encoded. The client browser needs to be set to render the HTML by means of utf-8 and not western latin1. Otherwise you get a strange non-breaking space character in the upper left hand corner of the page.

tracy.brown
  • 589
  • 4
  • 9
  • And there's always the quick and dirty way - send the rendered HTML through MySQL using a dumb query - e.g. SELECT \ as 'html'. This assumes you have MySQL and have it's character encoding defaulting to utf-8 (set names works also). – tracy.brown Dec 01 '09 at 00:49