3

I thought I understand Unicode scalars in Swift pretty well, but the dog face emoji proved me wrong.

for code in "".utf16 {
    print(code)
}

The UTF-16 codes are 55357 and 56374. In hex, that's d83d and dc36.

Now:

let dog = "\u{d83d}\u{dc36}"

Instead of getting a string with "", I'm getting an error:

Invalid unicode scalar

I tried with UTF-8 codes and it didn't work neither. Not throwing an error, but returning "ð¶" instead of the dog face.

What is wrong here?

Robo Robok
  • 21,132
  • 17
  • 68
  • 126

1 Answers1

6

The \u{nnnn} escape sequence expects a Unicode scalar value, not the UTF-16 representation (with high and low surrogates):

for code in "".unicodeScalars {
    print(String(code.value, radix: 16))
}
// 1f436

let dog = "\u{1F436}"
print(dog) // 

Solutions to reconstruct a string from its UTF-16 representation can be found at Is there a way to create a String from utf16 array in swift?. For example:

let utf16: [UInt16] = [ 0xd83d, 0xdc36 ]
let dog = String(utf16CodeUnits: utf16, count: utf16.count)
print(dog) // 
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • Hi again Martin. I think I misunderstood it, because it works for some characters, like country flags. But it looks like these are special cases of characters matching together, is it right? – Robo Robok Jan 23 '19 at 10:04
  • @RoboRobok: Flags are “extended grapheme clusters” – a sequence of Unicode scalar values which are considered as a single `Character` in Swift. – Martin R Jan 23 '19 at 10:06
  • Exactly. So now I know it’s a different animal, if you pardon the pun – Robo Robok Jan 23 '19 at 10:08
  • @RoboRobok: What you did would not work with country flags either. The UTF-16 representation of "" is (hex) d83c ddf5 d83c ddf1, but `"\u{d83c}\u{ddf5}\u{d83c}\u{ddf1}"` does not compile. It has to be `"\u{1f1f5}\u{1f1f1}"`. – Of course there are ways to reconstruct a string from a UTF-16 sequence, compare https://stackoverflow.com/questions/24542170/is-there-a-way-to-create-a-string-from-utf16-array-in-swift. – Martin R Jan 23 '19 at 10:14