i'm need to write a function that will flip all the characters of a string left-to-right.
e.g.:
Thė quiçk ḇrown fox jumṕềᶁ ovểr thë lⱥzy ȡog.
should become
.goȡ yzⱥl ëht rểvo ᶁềṕmuj xof nworḇ kçiuq ėhT
i can limit the question to UTF-16 (which has the same problems as UTF-8, just less often).
Naive solution
A naive solution might try to flip all the things (e.g. word-for-word, where a word is 16-bits - i would have said byte for byte if we could assume that a byte was 16-bits. i could also say character-for-character where character is the data type Char
which represents a single code-point):
String original = "ɗỉf̴ḟếr̆ęnͥt";
String flipped = "";
foreach (Char c in s)
{
flipped = c+fipped;
}
Results in the incorrectly flipped text:
ɗỉf̴ḟếr̆ęnͥt
̨tͥnę̆rếḟ̴fỉɗ
This is because one "character" takes multiple "code points".
ɗỉf̴ḟếr̆ęnͥt
ɗ
ỉ
f
˜
ḟ
ế
r
˘
ę
n
i
t
˛
and flipping each "code point" gives:
˛
t
i
n
ę
˘
r
ế
ḟ
˜
f
ỉ
ɗ
Which not only is not a valid UTF-16 encoding, it's not the same characters.
Failure
The problem happens in UTF-16 encoding when there is:
- combining diacritics
- characters in another lingual plane
Those same issues happen in UTF-8 encoding, with the additional case
- any character outside the 0..127 ASCII range
i can limit myself to the simpler UTF-16 encoding (since that's the encoding that the language that i'm using has (e.g. C#, Delphi)
The problem, it seems to me, is discovering if a number of subsequent code points are combining characters, and need to come along with the base glyph.
It's also fun to watch an online text reverser site fail to take this into account.
Note:
- any solution should assume that don't have access to a UTF-32 encoding library (mainly becuase i don't have access to any UTF-32 encoding library)
- access to a UTF-32 encoding library would solve the UTF-8/UTF-16 lingual planes problem, but not the combining diacritics problem