3

Occasionally I need to use an API that relies on NSString/NSRange behind the scenes, but most of my code is in Swift.

When I need to provide an index (or range), which Swift character count should I use?

For example, given this function:

func replace(_ string: String, characterAtIndex characterIndex: Int) -> String {
  let regex = try! NSRegularExpression(pattern: ".", options: [])
  let range = NSRange(location: characterIndex, length: 1)
  let mutableString = NSMutableString(string: string)
  regex.replaceMatches(in: mutableString, options: [], range: range, withTemplate: "!")
  return mutableString as String
}

Which of the 6 different ways of getting the character count on a string, should I use?

Senseful
  • 86,719
  • 67
  • 308
  • 465

1 Answers1

3

TL;DR

The documentation for NSString.length specifies:

The number of UTF-16 code units in the receiver.

Thus, if you want to interop between String and NSString:

  • You should use string.utf16.count, and it will match up perfectly with (string as NSString).length.

If you want to count the number of visible characters:

  • You should use string.count, and it will match up to the same number of times you need the (right) key on your keyboard until you get to the end of the string (assuming you start at the beginning).

    Note: This is not always 100% accurate, but it appears Apple is constantly improving the implementation to make it more and more accurate.


Here's a Swift 4.0 playground to test a bunch of strings and functions:

let header = "NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description"
var format = "     %3d     %3d ❓            %3d ❓      %3d ❓     %3d ❓          %3d ❓       %3d ❓              %3d ❓    %3d ❓   %@"
format = format.replacingOccurrences(of: "❓", with: "%@") // "❓" acts as a placeholder for "%@" to align the text perfectly

print(header)

test("")
test("abc")
test("❌")
test("")
test("☾test")
test("‍‍‍")
test("\u{200d}\u{200d}\u{200d}")
test("")
test("\u{1F468}")
test("‍♀️‍♂️")
test("你好吗")
test("مرحبا", "Arabic word")
test("م", "Arabic letter")
test("שלום", "Hebrew word")
test("ם", "Hebrew letter")

func test(_ s: String, _ description: String? = nil) {
  func icon(for length: Int) -> String {
    return length == (s as NSString).length ? "✅" : "❌"
  }

  let description = description ?? "'" + s + "'"
  let string = String(
    format: format,
    (s as NSString).length,
    s.utf16.count, icon(for: s.utf16.count),
    s.endIndex.encodedOffset, icon(for: s.endIndex.encodedOffset),
    NSRange(s.startIndex..<s.endIndex, in: s).upperBound, icon(for: NSRange(s.startIndex..<s.endIndex, in: s).upperBound),
    s.count, icon(for: s.count),
    s.characters.count, icon(for: s.characters.count),
    s.distance(from: s.startIndex, to: s.endIndex), icon(for: s.distance(from: s.startIndex, to: s.endIndex)),
    s.unicodeScalars.count, icon(for: s.unicodeScalars.count),
    s.utf8.count, icon(for: s.utf8.count),
    description)
  print(string)
}

And here is the output:

NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description
       0       0 ✅              0 ✅        0 ✅       0 ✅            0 ✅         0 ✅                0 ✅      0 ✅   ''
       3       3 ✅              3 ✅        3 ✅       3 ✅            3 ✅         3 ✅                3 ✅      3 ✅   'abc'
       1       1 ✅              1 ✅        1 ✅       1 ✅            1 ✅         1 ✅                1 ✅      3 ❌   '❌'
       4       4 ✅              4 ✅        4 ✅       1 ❌            1 ❌         1 ❌                2 ❌      8 ❌   ''
       5       5 ✅              5 ✅        5 ✅       5 ✅            5 ✅         5 ✅                5 ✅      7 ❌   '☾test'
      11      11 ✅             11 ✅       11 ✅       1 ❌            1 ❌         1 ❌                7 ❌     25 ❌   '‍‍‍'
      11      11 ✅             11 ✅       11 ✅       1 ❌            1 ❌         1 ❌                7 ❌     25 ❌   '‍‍‍'
       8       8 ✅              8 ✅        8 ✅       4 ❌            4 ❌         4 ❌                4 ❌     16 ❌   ''
       2       2 ✅              2 ✅        2 ✅       1 ❌            1 ❌         1 ❌                1 ❌      4 ❌   ''
      58      58 ✅             58 ✅       58 ✅      13 ❌           13 ❌        13 ❌               32 ❌    122 ❌   '‍♀️‍♂️'
       3       3 ✅              3 ✅        3 ✅       3 ✅            3 ✅         3 ✅                3 ✅      9 ❌   '你好吗'
       5       5 ✅              5 ✅        5 ✅       5 ✅            5 ✅         5 ✅                5 ✅     10 ❌   Arabic word
       1       1 ✅              1 ✅        1 ✅       1 ✅            1 ✅         1 ✅                1 ✅      2 ❌   Arabic letter
       4       4 ✅              4 ✅        4 ✅       4 ✅            4 ✅         4 ✅                4 ✅      8 ❌   Hebrew word
       1       1 ✅              1 ✅        1 ✅       1 ✅            1 ✅         1 ✅                1 ✅      2 ❌   Hebrew letter

Conclusions:

  • To get a length that is compatible with NSString/NSRange, use either (s as NSString).length, s.utf16.count (preferred), s.endIndex.encodedOffset, or NSRange(s.startIndex..<s.endIndex, in: s).
  • To get the number of visible characters, use either s.count (preferred), s.characters.count (deprecated), or s.distance(from: s.startIndex, to: s.endIndex)

A helpful extension to get the full range of a String:

public extension String {

  var nsrange: NSRange {
    return NSRange(startIndex..<endIndex, in: self)
  }
}

Thus, you can call the original method like so:

replace("‍‍‍", characterAtIndex: "‍‍‍".utf16.count - 1) // ‍‍‍�!
Senseful
  • 86,719
  • 67
  • 308
  • 465