10

When swift uses String.count is it:

O(n) where each time we call it we iterate through the entire String in order to count it

or

O(1) where swift has previously stored the size of this array and simply accesses it.

Declan McKenna
  • 4,321
  • 6
  • 54
  • 72
  • 5
    It is `O(n)` for sure. From the [Swift Book](https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html#//apple_ref/doc/uid/TP40014097-CH7-ID297): "be aware that the count property must iterate over the Unicode scalars in the entire string in order to determine the characters for that string". The only question is whether is a caching mechanism once `count` has been determined and the string has not been modified – Code Different May 28 '18 at 15:08
  • @CodeDifferent Interesting. That paragraph appears to imply that `Array.count` would be `O(1)` if each element takes up an equal amount of space. – Declan McKenna May 28 '18 at 15:16
  • @Deco that paragraph is specifically about `String.count` and not `Array.count`. Even though a `String` is actually a `Array` in Swift 4, you should never forget that it doesn't always behave like a normal `Array` due to the nature of Unicode grapheme clusters and normal characters. This is especially important for indexing and counting characters in a `String`. See this sentence from the same documentation: _"As a result, the number of characters in a string can’t be calculated without iterating through the string to determine its extended grapheme cluster boundaries."_ – Dávid Pásztor May 28 '18 at 15:18
  • @CodeDifferent Not really. `O(n)` would be the worst performance. Actually `String` has several ways to store data and maintains some helping flags, e.g. `isAscii` that can reduce the complexity to `O(1)`. – Sulthan May 28 '18 at 15:21
  • @Sulthan last time I looked, `String` uses either ASCII or UTF-16 internally. Given how many characters exist outside the ASCII table, I'd say assuming `O(n)` is the safer way. – Code Different May 28 '18 at 15:24
  • @Code And `opaque` :) I am trying to find some caching but it seems to me there is no `count` caching. You are right that we should always assume the worst though – Sulthan May 28 '18 at 15:36
  • 6
    Technically speaking, `O(n)` is an asymptotic *upper bound*, so even if `String` did cache the `count`, it would still be technically correct to describe the time complexity as `O(n)` ;) Also, [the documentation](https://developer.apple.com/documentation/swift/collection/2950091-count) for `Collection` says that the performance of `count` is "*`O(1)` if the collection conforms to `RandomAccessCollection`; otherwise, `O(n)`, where `n` is the length of the collection*" – so given that `String` isn't a `RandomAccessCollection`, it's `O(n)`. – Hamish May 28 '18 at 15:44
  • @CodeDifferent I don't think there is a caching mechanism. I mistakenly set n to the number of times I call `count` rather than the length of my `String` in my first answer. Our speed still went down at a rate of N. – Declan McKenna May 28 '18 at 16:16
  • @CodeDifferent This has been answered better in the comments than the answers. Feel free to put your initial comment in as an answer and I'll mark it correct. – Declan McKenna May 28 '18 at 16:24
  • @Hamish [`RandomAccessCollection`](https://developer.apple.com/documentation/swift/randomaccesscollection) means what exactly? Does it mean types that we can subscript them like `[2]` – mfaani May 28 '18 at 17:12
  • 3
    @Honey A `RandomAccessCollection` is a collection whose indices can be offset `n` places in `O(1)` time. Because `count` is a measure of how many times you can offset the `startIndex` to end up at the `endIndex`, random-access collections can always get their counts in constant time. Being a random access collection does not necessarily imply having integer indices though. For example, a `ReversedCollection<[Int]>` is a `RandomAccessCollection` (thanks to conditional conformances), but it has an opaque index type that keeps track of the base collection's index. – Hamish May 28 '18 at 18:22

3 Answers3

8

It is definitely O(n). From the Swift Book:

As a result, the number of characters in a string can't be calculated without iterating through the string to determine its extended grapheme cluster boundaries. If you are working with particularly long string values, be aware that the count property must iterate over the Unicode scalars in the entire string in order to determine the characters for that string.

This has a few implications, the biggest of which is integer subscripting (i.e. str[5]) is not available through the standard library. Internally, String uses ASCII or UTF-16 encoding (from Swift 5, it uses UTF-8 only). If the string only uses ASCII characters then count can be O(1) but ASCII only has 127 characters so consider this an exception rather than the rule.

NSString, on the other hand, always uses UTF-16 so accessing its length is O(1). And also keep in mind that NSString.length != String.count (try strings with emojis and you'll see).

As for your second question, it does not cache count for subsequent calls. Every call to count is thus O(n), even if the string has not changed. The code in the Foundation repo also confirms that.

Code Different
  • 90,614
  • 16
  • 144
  • 163
1

After failing to find documentation on this or being able to find this function within the source code I tested this myself using performance tests as described below. It assumed O(1) was possible based on PHP's Array being O(1). Swifts String.count function appears to be O(n).

Results

Unit Test Results

Is count cached when it's been called before now? (no)

I also tested to see if calling String.count once would cache it. By comparing results when count has already been called and when it has been stored to a variable to ensure it's not being stored prior to calling .count in our normal tests.

Caching Tests

Tests

import XCTest

class CountTests: XCTestCase {

    func test100K() {
        let testString = String(repeating: "a", count: 100000)
        self.measure {
            _ = testString.count
        }
    }

    func test1000K() {
        let testString = String(repeating: "a", count: 1000000)
        self.measure {
            _ = testString.count
        }
    }

    func test10000K() {
        let testString = String(repeating: "a", count: 10000000)
        self.measure {
            _ = testString.count
        }
    }

    func test10000KCached() {
        let testString = String(repeating: "a", count: 10000000)
        _ = testString.count
        self.measure {
            _ = testString.count
        }
    }

    func test10000KStrong() {
        let testString = String(repeating: "a", count: 10000000)
        let count = testString.count
        self.measure {
            _ = count
        }
    }
}
Declan McKenna
  • 4,321
  • 6
  • 54
  • 72
  • 3
    These tests don't even mean anything, since you're not changing `testString.count`, simply changing how many times you call `testString.count`... This has nothing to do with the asymptotic time complexity of `String.count`. – Dávid Pásztor May 28 '18 at 15:35
  • @DávidPásztor You're right thank you. I've updated my answer, would this be accurate? It appears like this is `O(n)`. – Declan McKenna May 28 '18 at 16:03
  • 1
    Your updated tests at least do take the size of the input into account, but they're still not really conclusive, since you're only testing plain ASCII Strings. Moreover, as it was established in the comments on the question itself, the documentation does state that `String.count` is `O(n)`, so there's no need for running any tests – Dávid Pásztor May 28 '18 at 16:15
  • 3
    It's not surprising the PHP's String has `O(1)` count and indexing. It uses fixed-length (8 bit) code units, and doesn't support Unicode. – Alexander May 28 '18 at 16:18
-2

Looks like O(n) to me based on a quick Playground test.

for step in 1...10 {
    let length = step * 100000
    let string = String(repeating: "x", count: length)
    let start = Date()
    let stringLength = string.count
    let end = Date()
    print("Length: \(stringLength), time: \(end.timeIntervalSince(start))")
}

// Length: 100000, time: 0.00178205966949463
// Length: 200000, time: 0.00132298469543457
// Length: 300000, time: 0.00184988975524902
// Length: 400000, time: 0.00218689441680908
// Length: 500000, time: 0.00302803516387939
// Length: 600000, time: 0.00368499755859375
// Length: 700000, time: 0.0039069652557373
// Length: 800000, time: 0.00444602966308594
// Length: 900000, time: 0.0052180290222168
// Length: 1000000, time: 0.00539696216583252
closetCoder
  • 1,064
  • 10
  • 21
  • 7
    Never use the playground to do any kind of performance tests. – rmaddy May 28 '18 at 15:12
  • 3
    You are not really checking the `count`. You are also checking how long it takes to generate the data. You are checking ASCII strings only anyway which is an error considering the implementation. – Sulthan May 28 '18 at 15:37
  • Admittedly quick and dirty, but the string is created before I capture the start time. – closetCoder May 28 '18 at 15:43
  • 1
    @closetCoder even if your test code wasn't quick and dirty, you should never assess performance in a Playground since there can be huge performance differences between Playground execution or execution on a real device, regardless of which Apple platform you're targeting. – Dávid Pásztor May 28 '18 at 16:11
  • OK, I'm sure you are right. Here are the results from the same code inside a fully compiled macOS app. Same machine. Same conclusion, I think. Length: 100000, time: 0.000870943069458008 Length: 200000, time: 0.00115108489990234 Length: 300000, time: 0.00173497200012207 Length: 400000, time: 0.00217998027801514 Length: 500000, time: 0.0027470588684082 Length: 600000, time: 0.00323700904846191 Length: 700000, time: 0.00389111042022705 Length: 800000, time: 0.00445103645324707 Length: 900000, time: 0.00502490997314453 Length: 1000000, time: 0.00524795055389404 – closetCoder May 28 '18 at 16:19
  • Repeated the test in macOS app using unicode character "" rather than "x". Much slower. Cut the length of the strings to speed it up. Definitely not O(1). Worse than O(n). Length: 1000, time: 0.0232720375061035 Length: 2000, time: 0.0928299427032471 Length: 3000, time: 0.196869015693665 Length: 4000, time: 0.343490958213806 Length: 5000, time: 0.536596059799194 Length: 6000, time: 0.76084303855896 Length: 7000, time: 1.03708600997925 Length: 8000, time: 1.35282206535339 Length: 9000, time: 1.72617793083191 Length: 10000, time: 2.12419903278351 – closetCoder May 28 '18 at 16:27
  • _Never use the playground to do any kind of performance tests_ . Why? @rmaddy – mfaani May 28 '18 at 17:17
  • 1
    @Honey Because the playground is doing all kinds of work behind the scenes. It's not a fair test. – rmaddy May 28 '18 at 17:26
  • @Honey have a look at [this](https://stackoverflow.com/a/24052267/4667835) SO answer about measuring code performance in a playground, written by an Engineering Manager at Apple. He clearly explains why the official standpoint is not to do any time measurements in a playground. – Dávid Pásztor May 28 '18 at 18:06