3

In Swift 3, you can count the characters in a String with:

str.characters.count

I need to do this frequently, and that line above looks like it could be O(N). Is there a way to get a string length, or a length of something — maybe the underlying unicode buffer — with an operation that is guaranteed to not have to walk the entire string? Maybe:

str.utf16.count

I ask because I'm checking the length of some text every time the user types a character, to limit the size of a UITextView. The call doesn't need to be an exact count of the glyphs, like characters.count.

Rob N
  • 15,024
  • 17
  • 92
  • 165
  • Have you run a performance test and found that calculating the string length is your bottleneck? You are probably fine to check the length of a string every time the user enters a character. Just because something is O(n) doesn't mean it's slow. – nathangitter Sep 07 '17 at 19:34
  • You could try `star.characters.underestimatedCount` instead. But you might not see a significant performance increase from the standard `count` property. Both properties are *only* computed in O(1) time for collections conforming to `RandomAccessCollection` protocol, which `String` doesn't. – Paulo Mattos Sep 07 '17 at 19:48

1 Answers1

4

This is a good question. The answer is... complicated. Converting from UTF-8 to UTF-16, or vice-versa, or converting to or from some other encoding, will all require examining the string, since the characters can be made up of more than one code unit. So if you want to get the count in constant time, it's going to come down to what the internal representation is. If the string is using UTF-16 internally, then it's a reasonable assumption that string.utf16.count would be in constant time, but if the internal representation is UTF-8 or something else, then the string will need to be analyzed to determine what the length in UTF-16 would be. So what's String using internally? Well:

https://github.com/apple/swift/blob/master/stdlib/public/core/StringCore.swift

/// The core implementation of a highly-optimizable String that
/// can store both ASCII and UTF-16, and can wrap native Swift
/// _StringBuffer or NSString instances.

This is discouraging. The internal representation could be ASCII or UTF-16, or it could be wrapping a Foundation NSString. Hrm. We do know that NSString uses UTF-16 internally, since this is actually documented, so that's good. So the main outlier here is when the string stores ASCII. The saving grace is that since the first 128 Unicode code points have the same values as the ASCII character set, any ASCII character 0xXX should correspond to the UTF-16 character 0x00XX, so the UTF-16 length should simply be the ASCII length times two, and thus calculable in constant time. Is this the case in the implementation? Let's look.

In the UTF16View source, there is no implementation of count. It appears that count is inherited from Collection's implementation, which is implemented via distance():

public var count: IndexDistance {
  return distance(from: startIndex, to: endIndex)
}

UTF16View's implementation of distance() looks like this:

public func distance(from start: Index, to end: Index) -> IndexDistance {
  // FIXME: swift-3-indexing-model: range check start and end?
  return start.encodedOffset.distance(to: end.encodedOffset)
}

And in the String.Index source, encodedOffset looks like this:

public var encodedOffset : Int {
  return Int(_compoundOffset >> _Self._strideBits)
}

where _compoundOffset appears to be a simple 64-bit integer:

internal var _compoundOffset : UInt64

and _strideBits appears to be a straight integer as well:

internal static var _strideBits : Int { return 2 }

So it... looks... like you should get constant time from string.utf16.count, since unless I'm making a mistake somewhere, you're just bit-shifting a couple of integers and then comparing the results (I'd probably still run some tests to be sure). The caveat is, of course, that this isn't documented, and thus could change in the future—particularly since the documentation for String does claim that it needs to iterate through the string:

Unlike with isEmpty, calculating a view’s count property requires iterating through the elements of the string.

With all that said, you're using a UITextView, which is implemented in Objective-C via NSAttributedString. If you're willing to incur the Objective-C message-passing overhead (which, let's be honest, is probably occurring under the scenes anyway to generate the String), you can just call its length property, which, since NSAttributedString is built on top of NSString, which does guarantee that it uses UTF-16 internally, is almost certain to be in constant time.

Charles Srstka
  • 16,665
  • 3
  • 34
  • 60
  • I could be wrong but I remember seeing a thread that says that `string.utf16.count` is `O(n)` https://stackoverflow.com/questions/33094432/runtime-complexity-of-getting-the-length-of-a-string-in-different-representation – TNguyen Sep 07 '17 at 21:19
  • That was from two years ago, before the source code was available. It's also possible that things have changed since then. The source code seems to suggest that it is `O(1)`, unless I'm reading it incorrectly. – Charles Srstka Sep 07 '17 at 21:53