1

I've been messing around a bit with String indexes lately and I had a hard time figuring things out and something still bugs me.

I'm trying to use init(_:within:)'s method of type String.Index. It works great when I'm using an utf16index inside the bounds of the string but when outside it crashes with this message :

fatal error: Invalid String.UTF16Index for this UnicodeScalar view

Now I get it's a requirement to the function as stated in the doc :

/// - Requires: `utf16Index` is an element of
///   `characters.utf16.indices`.

Actual question : What I don't get is why crashing when this init is a failable initializer? Shouldn't it return nil?

I'll probably make a method checking if the index can be in the string but still it sounds strange to me.

Fantattitude
  • 1,842
  • 2
  • 18
  • 34
  • 1
    Actually I overlooked that requirement when writing http://stackoverflow.com/a/30404532/1187415, I fixed it now, after reading your question. – Martin R Aug 10 '15 at 10:14
  • @MartinR Pretty funny since my question comes originally from me implementing your solution to this question ! – Fantattitude Aug 10 '15 at 10:15

1 Answers1

2

The full header documentation for that method is

extension String.CharacterView.Index {

    // ...

    public init?(_ unicodeScalarIndex: UnicodeScalarIndex, within characters: String)
    /// Construct the position in `characters` that corresponds exactly to
    /// `utf16Index`. If no such position exists, the result is `nil`.
    ///
    /// - Requires: `utf16Index` is an element of
    ///   `characters.utf16.indices`.
    public init?(_ utf16Index: UTF16Index, within characters: String)

    // ...

}

So there are two different failure reasons:

  • The given utf16Index is outside of the range of valid indices of characters.utf16. This violates the requirement and causes a runtime exception.
  • The given utf16Index is a valid index of characters.utf16, but there is no character position corresponding to that index. In that case the method returns nil.

As an example, consider the string "ab". It consists of three characters, but four UTF-16 code units:

let str = "ab"
str.characters.count // 3
str.utf16.count // 4
Array(str.utf16) // [97, 55357, 56447, 98]

(See also Strings in Swift 2 in the Swift blog.)

The UTF-16 indices 0, 1, 3 correspond to a valid character position, but 2 does not:

String.Index(str.utf16.startIndex, within: str) // 0
String.Index(str.utf16.startIndex + 1, within: str) // 1
String.Index(str.utf16.startIndex + 2, within: str) // nil
String.Index(str.utf16.startIndex + 3, within: str) // 3

Actually the "one past the end" position (utf16.endIndex) is also valid (and that is not apparent to me from the header documentation), in that case characters.endIndex is returned:

String.Index(str.utf16.startIndex + 4, within: str) // 4
str.characters.endIndex // 4

But everything beyond endIndex causes a runtime exception:

String.Index(str.utf16.startIndex + 5, within: str) // EXC_BAD_INSTRUCTION

To compute an UTF-16 index that is inside the valid bounds, you can use the 3-parameter form of advance()

let i16 = advance(str.utf16.startIndex, offset, str.utf16.endIndex)
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • I was pretty sure it had something to do with characters taking more than one UTF16 code unit and your really complete answer explained it great. Thanks a lot ;) – Fantattitude Aug 10 '15 at 10:13