Most programming languages have some support for Unicode, but all have some more or less documented corner cases, where things won't work correctly.
Examples
Java: reverse() in StringBuilder/StringBuffer work correctly. But length(), charAt(), etc. in String do not if a character needs more than 16bit to encode.
C#: Didn't find a correct reverse method, Length and indexed access return wrong results.
Perl: Same problem.
PHP: Does not have an idea of Unicode at all, mbstring has some better working replacements.
I wonder if there is a programming language, which has full and correct Unicode support? What compromises had to be made there to achieve such a thing?
- More complex algorithms?
- Higher memory consumption?
- Slower performance?
How was it implemented internally?
- Array of Ints, Linked Lists, etc.
- Additional buffering
I saw that Python 3 had some pretty big changes in this area. How close is Python 3 now to a correct implementation?