20

How do you test your app for Iñtërnâtiônàlizætiøn compliance? I tell people to store the Unicode string Iñtërnâtiônàlizætiøn into each field and then see if it is displayed correctly on output.

--- including output as a cell's content in Excel reports, in rtf format for docs, xml files, etc.

What other tests should be done?

Added idea from @Paddy:

Also try a right-to-left language. Eg, שלום ירושלים ([The] Peace of Jerusalem). Should look like:

שלום ירושלים
(source: kluger.com)

Note: Stackoverflow is implemented correctly. If text does not match the image, then you have a problem with your browser, os, or possibly a proxy.

Also note: You should not have to change or "setup" your already running app to accept either the W European characters or the Hebrew example. You should be able to just type those characters into your app and have them come back correctly in your output. In case you don't have a Hebrew keyboard laying around, copy and paste the the examples from this question into your app.

Community
  • 1
  • 1
Larry K
  • 47,808
  • 15
  • 87
  • 140
  • 1
    Unicode support is a small part of the i18n process: dates, number formatting, currencies, sorting, social conventions, etc. http://stackoverflow.com/questions/2072491/best-practices-for-writing-software-to-be-consumed-internationally-i18n/2177674#2177674 – McDowell Jun 16 '10 at 20:40
  • I think this question would do well as a community wiki. – jtbandes Aug 12 '11 at 23:18

6 Answers6

7

Pick a culture where the text reads from right to left and set your system up for that - make sure that it reads properly (easier said than done...).

Paddy
  • 33,309
  • 15
  • 79
  • 114
  • Ahh, +1 good point. I have also tested with Hebrew and Arabic. But I can figure out if the Hebrew is getting reversed or not. I don't have that skill for Arabic. I'll update the OP with a screenshot. – Larry K Jun 16 '10 at 16:01
6

Use one of the three "pseudo-locales" available since Windows Vista:

The three different pseudo-locale are for testing 3 kinds of locales:

Base The qps-ploc locale is used for English-like pseudo localizations. Its strings are longer versions of English strings, using non-Latin and accented characters instead of the normal script. Additionally simple Latin strings should sort in reverse order with this locale.

Mirrored qpa-mirr is used for right-to-left pseudo data, which is another area of interest for testing.

East Asian qps-asia is intended to utilize the large CJK character repertoire, which is also useful for testing.

Windows will start formatting dates, times, numbers, currencies in a made-up psuedo-locale that looks enough like english that you can work with it, but obvious enough when you're not respecting the locale:

[Шěđлеśđαỳ !!!], 8 ōf [Μäŕςћ !!] ōf 2006

Community
  • 1
  • 1
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
4

There is more to internationalization than unicode handling. You also need to make sure that dates show up localized to the user's timezone, if you know it (and make sure there's a way for people to tell you what their time zone is).

One handy fact for testing timezone handling is that there are two timezones (Pacific/Tongatapu and Pacific/Midway) that are actually 24 hours apart. So if timezones are being handled properly, the dates should never be the same for users in those two timezones for any timestamp. If you use any other timezones in your tests, results may vary depending on the time of day you run your test suite.

You also need to make sure dates and times are formatted in a way that makes sense for the user's locale, or failing that, that any potential ambiguity in the rendering of dates is explained (e.g. "05/11/2009 (dd/mm/yyyy)").

jcdyer
  • 18,616
  • 5
  • 42
  • 49
3

"Iñtërnâtiônàlizætiøn" is a really bad string to test with since all the characters in it also appear in ISO-8859-1, so the string can work completely without any Unicode support at all! I've no idea why it's so commonly used when it utterly fails at its primary function!

Even Chinese or Hebrew text isn't a good choice (though right-to-left is a whole can of worms by itself) because it doesn't necessarily contain anything outside 3-byte UTF-8, which curiously was a very large hole in MySQL's default UTF-8 implementation (which is limited to 3-byte chars), until it was fixed by the addition of the utf8mb4 charset in MySQL 5.5. These days one of the more common uses of >3-byte UTF-8 is Emojis like these: [⛔]. If you don't see some pretty little coloured pictures between those brackets, congratulations, you just found a hole in your Unicode stack!

Synchro
  • 35,538
  • 15
  • 81
  • 104
3

First, learn The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

Make sure your application can handle Turkish. It has several quirks that break applications that assume English rules. Because there are four kinds of letter "i" (dotted and dot-less, upper and lower case), applications that assume uppercase(i) => I will break when using Turkish rules, where uppercase(i) => İ.

A common thing to do is check if the user typed the command "exit" by using lowercase(userInput) == "exit" or uppercase(userInput) == "EXIT". This works as expected under English rules but will fail under Turkish rules where "exıt" != "exit" and "EXİT" != "EXIT". To do this correctly, one must use case-insensitive comparison routines which are built into all modern languages.

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
1

I was thinking about this question from a completely different angle. I can't recall exactly what we did, but on a previous project I think we wound up changing the Regional Settings (in the Regional and Language Options control panel?) to help us ensure the localized strings were working.

Ogre Psalm33
  • 21,366
  • 16
  • 74
  • 92