0

I'm working on an application using the CakePHP framework and in the past I ran into a few encounters with encoding.

To avoid these issues in my application, I started doing some research. But I'm still a little confused about the how and why.

My application will need to support all languages, yes even languages such as Chineese. Most of the data will be stored into a MySQL database, and that's where confusion starts. What should I use as collation?

Based on what I've read the past few days, I come to the conclusion the best choice for collation would be utf8_unicode_ci. Is this correct?

Now onto the PHP, what would I set as encoding? UTF-8? I need to completely be sure not a single character shows up the way it shouldn't. Content will be submitted through forms, so the output has to be the same as the input.

I hope anyone can give me an answer to my questions and help clarify it to me, thanks in advance.

Kevin Vandenborne
  • 1,397
  • 1
  • 10
  • 28

2 Answers2

1

You need UTF-8 encoding to store you data. But as for collation, it is used to sort strings. Unfortunatelly, there exists no universal collation, and such universal collation can not exists, because collations are contradictory.

To make a point on example, in Czech 'ch' goes after 'h', opposite to most other Latin script languages.

Stepan Vihor
  • 1,069
  • 9
  • 10
0

Yes, utf8_unicode_ci is a sane choice when you don't know in advance the language. As for PHP I'll just link to some answers I wrote in the past:

How to best configure PHP to handle a UTF-8 website
Croatian diacritic signs in MySQL db (utf-8)
Am I correctly supporting UTF-8 in my PHP apps?

One additional advice would be to make sure your text editor saves all files as UTF-8 (NO BOM, if you have this option). In short, keep everything utf-8 from the very beginning and you should be safe.

Community
  • 1
  • 1
djn
  • 3,950
  • 22
  • 21