This is a good question. The answer is... complicated. Converting from UTF-8 to UTF-16, or vice-versa, or converting to or from some other encoding, will all require examining the string, since the characters can be made up of more than one code unit. So if you want to get the count in constant time, it's going to come down to what the internal representation is. If the string is using UTF-16 internally, then it's a reasonable assumption that string.utf16.count
would be in constant time, but if the internal representation is UTF-8 or something else, then the string will need to be analyzed to determine what the length in UTF-16 would be. So what's String using internally? Well:
https://github.com/apple/swift/blob/master/stdlib/public/core/StringCore.swift
/// The core implementation of a highly-optimizable String that
/// can store both ASCII and UTF-16, and can wrap native Swift
/// _StringBuffer or NSString instances.
This is discouraging. The internal representation could be ASCII or UTF-16, or it could be wrapping a Foundation NSString
. Hrm. We do know that NSString
uses UTF-16 internally, since this is actually documented, so that's good. So the main outlier here is when the string stores ASCII. The saving grace is that since the first 128 Unicode code points have the same values as the ASCII character set, any ASCII character 0xXX
should correspond to the UTF-16 character 0x00XX
, so the UTF-16 length should simply be the ASCII length times two, and thus calculable in constant time. Is this the case in the implementation? Let's look.
In the UTF16View
source, there is no implementation of count
. It appears that count
is inherited from Collection
's implementation, which is implemented via distance()
:
public var count: IndexDistance {
return distance(from: startIndex, to: endIndex)
}
UTF16View
's implementation of distance()
looks like this:
public func distance(from start: Index, to end: Index) -> IndexDistance {
// FIXME: swift-3-indexing-model: range check start and end?
return start.encodedOffset.distance(to: end.encodedOffset)
}
And in the String.Index
source, encodedOffset
looks like this:
public var encodedOffset : Int {
return Int(_compoundOffset >> _Self._strideBits)
}
where _compoundOffset
appears to be a simple 64-bit integer:
internal var _compoundOffset : UInt64
and _strideBits
appears to be a straight integer as well:
internal static var _strideBits : Int { return 2 }
So it... looks... like you should get constant time from string.utf16.count
, since unless I'm making a mistake somewhere, you're just bit-shifting a couple of integers and then comparing the results (I'd probably still run some tests to be sure). The caveat is, of course, that this isn't documented, and thus could change in the future—particularly since the documentation for String
does claim that it needs to iterate through the string:
Unlike with isEmpty, calculating a view’s count property requires iterating through the elements of the string.
With all that said, you're using a UITextView
, which is implemented in Objective-C via NSAttributedString
. If you're willing to incur the Objective-C message-passing overhead (which, let's be honest, is probably occurring under the scenes anyway to generate the String
), you can just call its length
property, which, since NSAttributedString
is built on top of NSString
, which does guarantee that it uses UTF-16 internally, is almost certain to be in constant time.