0

To store chinese characters in mysql, is it recommended to store them as UTF8 or UCS2? (I am using char and varchar)

Also, I have seen that UTF8 uses 4 bytes of data to store values. How many does UCS2 use?

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
David19801
  • 11,214
  • 25
  • 84
  • 127

1 Answers1

0

I have seen that UTF8 uses 4 bytes of data to store values. How many does UCS2 use?

UTF-8 consists of variable length characters ranging from 1 to 3 bytes, UCS2 (UTF-16) is a fixed 2 bytes per character.

To store chinese characters in mysql, is it recommended to store them as UTF8 or UCS2?

I have no experience with chinese characters, but the top answer to this SO question answers the basic question quite nicely: Difference between UTF-8 and UTF-16?

From there:

Most reasonable characters, like Latin, Cyrillic, Chinese, Japanese can be represented with 2 bytes. Unless really exotic characters are needed, this means that the 16-bit subset of UTF-16 can be used as a fixed-length encoding, which speeds indexing.

it seems like for chinese characters, UCS-2 tends to save storage space. If this is for a web project, I would however tend to use UTF-8 because it is the more widespread encoding, and a standard in the web world. Additional arguments for UTF-8 here: Should UTF-16 be considered harmful?


mySQL Reference: 9.1.10. Unicode Support

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088