1

I have a latin1 table (latin1_swedish_ci) that works great for strings like "SÉRIE TÉLÉVISÉE" - but when I use a string like "ẼFINI", it changes the first character to Ẽ. Now this ultimately works (it is displayed as "Ẽ") but I'm just curious what other characters would get this treatment?

The impact is that my string runs out of space because of the extra characters used, so this question isn't entirely academic. I'm considering going to UTF8, i.e.

ALTER TABLE description CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

Scott C Wilson
  • 19,102
  • 10
  • 61
  • 83
  • Q: What is the database? MSSQL? MySQL? Look here: http://stackoverflow.com/questions/4769517/mysql-collation-latin1-swedish-ci-vs-utf8-general-ci and here [Difference between Encoding and Collation](http://stackoverflow.com/questions/7723648/difference-between-encoding-and-collation) – paulsm4 Jul 27 '15 at 01:05
  • 1
    Of possible interest to you: http://stackoverflow.com/questions/8879564/latin-1-to-utf-8-database – Tim Biegeleisen Jul 27 '15 at 01:05
  • @paulsm4 No, it is mysql. – Scott C Wilson Jul 27 '15 at 01:06
  • @TimBiegeleisen Yes, I am familiar with the conversion procedure; I'm just trying to measure the gap between these representations to see what other characters would get hit if I didn't convert. – Scott C Wilson Jul 27 '15 at 01:08
  • 1
    @ScottWilson You can look this up on a site like WikiPedia. But even if you convince yourself that you are safe now with only Latin1, how could you be certain that you would not have a problem later on with new data? I'd bite the bullet now, but that's just my choice. – Tim Biegeleisen Jul 27 '15 at 01:09

1 Answers1

1

I don't have a specific list of character that would take 1 byte in Latin1, but multiple bytes in UTF-8, but here's a good illustration that "the problem is more than academic":

When to use utf-8 and when to use latin1 in MySQL?

enter image description here

Strong suggestion: go with UTF-8, if at all possible.

ADDENDUM:

As Tim Biegeleisen said above:

@ScottWilson You can look this up on a site like WikiPedia. But even if you convince yourself that you are safe now with only Latin1, how could you be certain that you would not have a problem later on with new data? I'd bite the bullet now, but that's just my choice.

Community
  • 1
  • 1
paulsm4
  • 114,292
  • 17
  • 138
  • 190