How to flip text horizontally?

Question

i'm need to write a function that will flip all the characters of a string left-to-right.

e.g.:

Thė quiçk ḇrown fox jumṕềᶁ ovểr thë lⱥzy ȡog.

should become

.goȡ yzⱥl ëht rểvo ᶁềṕmuj xof nworḇ kçiuq ėhT

i can limit the question to UTF-16 (which has the same problems as UTF-8, just less often).

Naive solution

A naive solution might try to flip all the things (e.g. word-for-word, where a word is 16-bits - i would have said byte for byte if we could assume that a byte was 16-bits. i could also say character-for-character where character is the data type Char which represents a single code-point):

String original = "ɗỉf̴ḟếr̆ęnͥt";
String flipped = "";
foreach (Char c in s)
{
   flipped = c+fipped;
}

Results in the incorrectly flipped text:

ɗỉf̴ḟếr̆ęnͥt
̨tͥnę̆rếḟ̴fỉɗ

This is because one "character" takes multiple "code points".

ɗỉf̴ḟếr̆ęnͥt
ɗ ỉ f ˜ ḟ ế r ˘ ę n i t ˛

and flipping each "code point" gives:

˛ t i n ę ˘ r ế ḟ ˜ f ỉ ɗ

Which not only is not a valid UTF-16 encoding, it's not the same characters.

Failure

The problem happens in UTF-16 encoding when there is:

combining diacritics
characters in another lingual plane

Those same issues happen in UTF-8 encoding, with the additional case

any character outside the 0..127 ASCII range

i can limit myself to the simpler UTF-16 encoding (since that's the encoding that the language that i'm using has (e.g. C#, Delphi)

The problem, it seems to me, is discovering if a number of subsequent code points are combining characters, and need to come along with the base glyph.

It's also fun to watch an online text reverser site fail to take this into account.

Note:

any solution should assume that don't have access to a UTF-32 encoding library (mainly becuase i don't have access to any UTF-32 encoding library)

access to a UTF-32 encoding library would solve the UTF-8/UTF-16 lingual planes problem, but not the combining diacritics problem

Interesting question but why are you trying to flip the string? Lots of modern environments for displaying text can display it right-to-left for you. What are you trying to do? — David Grayson, Jan 24 '12 at 16:50

bobince · Accepted Answer · 2012-01-24T22:26:18.777

3

The term you're looking for is “grapheme cluster”, as defined in Unicode TR29 Cluster Boundaries.

Group the UTF-16 code units into Unicode code points (=characters) using the surrogate algorithm (easy), then group the characters into grapheme clusters using the Grapheme_Cluster_Break rules. Finally reverse the group order.

You will need a copy of the Unicode character database in order to recognise grapheme cluster boundaries. That's already going to take up a considerable amount of space, so you're probably going to want to get a library to do it. For example in ICU you might use a CharacterIterator (which is misleadingly named as it works on grapheme clusters, not ‘characters’ as Unicode knows it).

edited Jan 24 '12 at 22:26

answered Jan 24 '12 at 22:17

bobince

528,062
107
651
834

Sounds like a good avenue to pursue - but my god it looks well over my head. – Ian Boyd Apr 30 '12 at 17:51
1

Yeah, you certainly wouldn't want to do it from scratch! But it's generally not too bad with a library to throw at it. – bobince Apr 30 '12 at 21:57

score 2 · Answer 2 · answered Jan 24 '12 at 18:20

If you work in UTF-32, you solve the non-base-plane issue. Converting from UTF-8 or UTF-16 to UTF-32 (and back) is relatively simple bit twiddling (see Wikipedia). You don't have to have a library for it.

Most of the combining characters are in a few ranges. You could determine those ranges by scanning the Unicode database (see Unicode.org). Hardcode those ranges into your application. With that, you can determine the groups of codepoints that represent a single character. (The drawback is that new combining marks could be introduced in the future, and you'd need to update your table.)

Segment appropriately, reverse the order (segment by segment), and convert back to UTF-8 or UTF-16 (or whatever you want).

UTF-32 still has the gotcha that *not every character is 4-bytes*. Still have the problems of diacritics. — Ian Boyd, Apr 30 '12 at 17:50
@IanBoyd: That's the point I was attempting to make in the second paragraph. — Adrian McCarthy, Apr 30 '12 at 18:18

score -1 · Answer 3 · answered Apr 09 '13 at 00:06

-1

Text Mechanic's Text Generator seems to do this in JavaScript. I'm sure it would be possible to translate the JS into another language after obtaining the author's consent (if you can find a 'contact' link for that site).

answered Apr 09 '13 at 00:06

Agi Hammerthief

2,114
1
22
38

It doesn't reverse property (e.g. it generates `tͥnę̆rếḟ̴fỉɗ`) – Ian Boyd Apr 10 '13 at 18:07

How to flip text horizontally?

Naive solution

Failure

3 Answers3