0

I am having one question with the Extended Grapheme Clusters. For example, look at following code:

let message = "c\u{0327}a va bien" // => "ça va bien" 

How does Swift know it needs to be combined (i.e. ç) rather than treating it as a small letter c AND a "COMBINING CEDILLA"?

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
JohnnL
  • 111
  • 11
  • 1
    It 's part of Unicode standard, see https://developer.apple.com/library/content/qa/qa1235/_index.html. Is there a reason you want to know other than curiosity? – Code Different Dec 30 '17 at 03:15
  • Hey, Thanks for the link. I was just curious about how it works. One possible scenario could be to have text like " combined c ̧ result will be ç". In that text the first " ̧ " after c is not combined with c to "ç". I was wondering how that can be done in Swift. – JohnnL Jan 01 '18 at 23:32
  • In other words how can I get the two decomposed Unicode characters displaying literally as 2 characters not one single character ? – JohnnL Jan 01 '18 at 23:38

1 Answers1

1

Use the unicodeScalars view on the string:

let message1 = "c\u{0327}".decomposedStringWithCanonicalMapping
for scalar in message1.unicodeScalars {
    print(scalar) // print c and Combining Cedilla separately
}

let message2 = "c\u{0327}".precomposedStringWithCanonicalMapping
for scalar in message2.unicodeScalars {
    print(scalar) // print Latin Small Letter C with Cedilla
}

Note that not all composite characters have a precomposed form, as noted by Apple's Technical Q&A:

Important: Do not convert to precomposed Unicode in an attempt to simplify your text processing. Precomposed Unicode can still contain composite characters. For example, there is no precomposed equivalent of U+0065 U+030A (LATIN SMALL LETTER E followed by COMBINING RING ABOVE)

Code Different
  • 90,614
  • 16
  • 144
  • 163