Which Swift character count should I use when interacting with NSString APIs?

Question

Occasionally I need to use an API that relies on NSString/NSRange behind the scenes, but most of my code is in Swift.

When I need to provide an index (or range), which Swift character count should I use?

For example, given this function:

func replace(_ string: String, characterAtIndex characterIndex: Int) -> String {
  let regex = try! NSRegularExpression(pattern: ".", options: [])
  let range = NSRange(location: characterIndex, length: 1)
  let mutableString = NSMutableString(string: string)
  regex.replaceMatches(in: mutableString, options: [], range: range, withTemplate: "!")
  return mutableString as String
}

Which of the 6 different ways of getting the character count on a string, should I use?

Related: [Swift extract regex matches](https://stackoverflow.com/q/27880650) — Martin R, Jul 20 '18 at 06:59

Senseful · Accepted Answer · 2018-07-20T06:48:45.240

TL;DR

The documentation for NSString.length specifies:

The number of UTF-16 code units in the receiver.

Thus, if you want to interop between String and NSString:

You should use string.utf16.count, and it will match up perfectly with (string as NSString).length.

If you want to count the number of visible characters:

You should use string.count, and it will match up to the same number of times you need the → (right) key on your keyboard until you get to the end of the string (assuming you start at the beginning).

Note: This is not always 100% accurate, but it appears Apple is constantly improving the implementation to make it more and more accurate.

Here's a Swift 4.0 playground to test a bunch of strings and functions:

let header = "NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description"
var format = "     %3d     %3d ❓            %3d ❓      %3d ❓     %3d ❓          %3d ❓       %3d ❓              %3d ❓    %3d ❓   %@"
format = format.replacingOccurrences(of: "❓", with: "%@") // "❓" acts as a placeholder for "%@" to align the text perfectly

print(header)

test("")
test("abc")
test("❌")
test("")
test("☾test")
test("‍‍‍")
test("\u{200d}\u{200d}\u{200d}")
test("")
test("\u{1F468}")
test("‍♀️‍♂️")
test("你好吗")
test("مرحبا", "Arabic word")
test("م", "Arabic letter")
test("שלום", "Hebrew word")
test("ם", "Hebrew letter")

func test(_ s: String, _ description: String? = nil) {
  func icon(for length: Int) -> String {
    return length == (s as NSString).length ? "✅" : "❌"
  }

  let description = description ?? "'" + s + "'"
  let string = String(
    format: format,
    (s as NSString).length,
    s.utf16.count, icon(for: s.utf16.count),
    s.endIndex.encodedOffset, icon(for: s.endIndex.encodedOffset),
    NSRange(s.startIndex..<s.endIndex, in: s).upperBound, icon(for: NSRange(s.startIndex..<s.endIndex, in: s).upperBound),
    s.count, icon(for: s.count),
    s.characters.count, icon(for: s.characters.count),
    s.distance(from: s.startIndex, to: s.endIndex), icon(for: s.distance(from: s.startIndex, to: s.endIndex)),
    s.unicodeScalars.count, icon(for: s.unicodeScalars.count),
    s.utf8.count, icon(for: s.utf8.count),
    description)
  print(string)
}

And here is the output:

NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description
       0       0 ✅              0 ✅        0 ✅       0 ✅            0 ✅         0 ✅                0 ✅      0 ✅   ''
       3       3 ✅              3 ✅        3 ✅       3 ✅            3 ✅         3 ✅                3 ✅      3 ✅   'abc'
       1       1 ✅              1 ✅        1 ✅       1 ✅            1 ✅         1 ✅                1 ✅      3 ❌   '❌'
       4       4 ✅              4 ✅        4 ✅       1 ❌            1 ❌         1 ❌                2 ❌      8 ❌   ''
       5       5 ✅              5 ✅        5 ✅       5 ✅            5 ✅         5 ✅                5 ✅      7 ❌   '☾test'
      11      11 ✅             11 ✅       11 ✅       1 ❌            1 ❌         1 ❌                7 ❌     25 ❌   '‍‍‍'
      11      11 ✅             11 ✅       11 ✅       1 ❌            1 ❌         1 ❌                7 ❌     25 ❌   '‍‍‍'
       8       8 ✅              8 ✅        8 ✅       4 ❌            4 ❌         4 ❌                4 ❌     16 ❌   ''
       2       2 ✅              2 ✅        2 ✅       1 ❌            1 ❌         1 ❌                1 ❌      4 ❌   ''
      58      58 ✅             58 ✅       58 ✅      13 ❌           13 ❌        13 ❌               32 ❌    122 ❌   '‍♀️‍♂️'
       3       3 ✅              3 ✅        3 ✅       3 ✅            3 ✅         3 ✅                3 ✅      9 ❌   '你好吗'
       5       5 ✅              5 ✅        5 ✅       5 ✅            5 ✅         5 ✅                5 ✅     10 ❌   Arabic word
       1       1 ✅              1 ✅        1 ✅       1 ✅            1 ✅         1 ✅                1 ✅      2 ❌   Arabic letter
       4       4 ✅              4 ✅        4 ✅       4 ✅            4 ✅         4 ✅                4 ✅      8 ❌   Hebrew word
       1       1 ✅              1 ✅        1 ✅       1 ✅            1 ✅         1 ✅                1 ✅      2 ❌   Hebrew letter

Conclusions:

To get a length that is compatible with NSString/NSRange, use either (s as NSString).length, s.utf16.count (preferred), s.endIndex.encodedOffset, or NSRange(s.startIndex..<s.endIndex, in: s).
To get the number of visible characters, use either s.count (preferred), s.characters.count (deprecated), or s.distance(from: s.startIndex, to: s.endIndex)

A helpful extension to get the full range of a String:

public extension String {

  var nsrange: NSRange {
    return NSRange(startIndex..<endIndex, in: self)
  }
}

Thus, you can call the original method like so:

replace("‍‍‍", characterAtIndex: "‍‍‍".utf16.count - 1) // ‍‍‍�!

Which Swift character count should I use when interacting with NSString APIs?

1 Answers1

TL;DR

Linked