8

The following example is taken from the Strings and Characters documentation:

enter image description here

The values 55357 (U+D83D in hex) and 56374 (U+DC36 in hex) are the surrogate pairs that form the Unicode scalar U+1F436, which is the DOG FACE character. Is there any way to go the other direction? That is, can I convert a surrogate pair into a scalar?

I tried

let myChar: Character = "\u{D83D}\u{DC36}"

but I got an "Invalid Unicode scalar" error.

This Objective C answer and this project seem to be custom solutions, but is there anything built into Swift (especially Swift 2.0+) that does this?

Community
  • 1
  • 1
Suragch
  • 484,302
  • 314
  • 1,365
  • 1,393
  • Specify the code point directly: `\u{1F436}`. There is an example in the document you link to `let sparklingHeart = "\u{1F496}" // , Unicode scalar U+1F496` – nhahtdh Jul 08 '15 at 04:54
  • 2
    What if I don't know the full code point? That is, what if I only know the surrogate pairs? – Suragch Jul 08 '15 at 05:01
  • `String` has a `init?(_ utf16: String.UTF16View)` method, but I haven't found yet how to *create* a `String.UTF16View` from a given array. – A similar question (with possible solutions) is here: [Is there a way to create a String from utf16 array in swift?](http://stackoverflow.com/questions/24542170/is-there-a-way-to-create-a-string-from-utf16-array-in-swift). – Martin R Jul 08 '15 at 05:16
  • @MartinR oops sorry see you’ve already given the utf16 decoding answer. Though you might also find the 2.0 version interesting. – Airspeed Velocity Jul 08 '15 at 07:07
  • @AirspeedVelocity: The code from http://stackoverflow.com/a/24757284/1187415 works for Swift 2 as well. It is a bit longer because it distinguishes between "end of input" and "error" condition. – Martin R Jul 08 '15 at 07:16
  • @Suragch You seem to be asking how to calculate the code point based on a surrogate pair. I’ve answered that here: [http://stackoverflow.com/a/31287075](http://stackoverflow.com/a/31287075/96656) – Mathias Bynens Jul 08 '15 at 08:17

2 Answers2

6

There are formulas to calculate the original code point based on a surrogate pair and vice versa. From https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae:

Section 3.7 of The Unicode Standard 3.0 defines the algorithms for converting to and from surrogate pairs.

A code point C greater than 0xFFFF corresponds to a surrogate pair <H, L> as per the following formula:

H = Math.floor((C - 0x10000) / 0x400) + 0xD800
L = (C - 0x10000) % 0x400 + 0xDC00

The reverse mapping, i.e. from a surrogate pair <H, L> to a Unicode code point C, is given by:

C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000
Mathias Bynens
  • 144,855
  • 52
  • 216
  • 248
3

Given an sequence of UTF-16 code units (i.e. 16-bit numbers, such as you get from String.utf16 or just an array of numbers), you can use the UTF16 type and its decode method to turn it into UnicodeScalars, which you can then convert into a String.

It’s a bit of a grungy item, that takes a generator (as it does stateful processing) and returns an enum that indicates a result (with an associated type of the scalar), or an error or completion. Swift 2.0 pattern matching makes it a lot easier to use:

let u16data: [UInt16] = [0xD83D,0xDC36]
//or let u16data = "Hello, ".utf16

var g = u16data.generate()
var s: String = ""
var utf16 = UTF16()
while case let .Result(scalar) = utf16.decode(&g) {
    print(scalar, &s)
}
print(s) // prints 
Airspeed Velocity
  • 40,491
  • 8
  • 113
  • 118
  • It took me a little while to learn some of the new concepts (1. [decode method](https://developer.apple.com/library/prerelease/ios/documentation/Swift/Reference/Swift_UTF16_Structure/index.html), 2. generator ([here](https://en.wikipedia.org/wiki/Generator_(computer_programming)) and [here](http://devsmash.com/blog/whats-the-big-deal-with-generators)), 3. [stateful](http://programmers.stackexchange.com/a/154499/186547)), but this was a useful answer. I guess the answer to my original question is no, there is nothing built in to Swift to do this directly, but it is not too hard to generate. – Suragch Jul 08 '15 at 21:15