2

I'd like to get a substring of a string that is just the first N characters. (10, for example).

But the string is the generated output of a function.

I'd like to do it like this...

let string = someInputString
    .someGeneratorFunctionThatReturnsAString()
    .substring(to: 10)

But I can't do this. Because the index in the substring function needs the string itself to determine where the 10th character is.

  • First question... why is this the case?
  • Second question... how can I get a substring like this without needing the original string to get the index.
Fogmeister
  • 76,236
  • 42
  • 207
  • 306
  • 3
    From what I've gatered, it's more or less like this: Strings in Swift are internally stored as UTF-8, so random access is not possible (characters are of variable byte length), so there's this thing called "indices" that you create form a specific string and modify by advancing or going back an integer number of places (sequentially), and they work only for that string. – Nicolas Miari Jul 24 '17 at 09:50
  • @NicolasMiari ah, that makes sense. Add it as an answer :D – Fogmeister Jul 24 '17 at 09:52
  • Thanks, but my knowledge is quite shallow and might be slightly off. I'll leave the honour to someone else. In the mean time, you can check out this link: https://oleb.net/blog/2016/08/swift-3-strings/ – Nicolas Miari Jul 24 '17 at 09:54
  • 2
    @NicolasMiari: Actually strings use internally either ASCII or UTF-16, but that is an implementation detail and might change in the future. – Martin R Jul 24 '17 at 10:00
  • @MartinR Alright, UTF-16 too is variable length, right? I wonder how come accessing the n-th character was straightforward in Objective-C... Must study this – Nicolas Miari Jul 24 '17 at 10:02
  • 1
    @NicolasMiari: Not really. But a `Character` is an "extended grapheme cluster" and consists of one or many unicode scalars. – Martin R Jul 24 '17 at 10:03
  • 1
    @Fogmeister: Here is a simple solution for your problem: https://stackoverflow.com/a/32984213/1187415. – Martin R Jul 24 '17 at 10:04
  • 1
    @Fogmeister See? I was talking out of my a** :-) Listen to the people who know! – Nicolas Miari Jul 24 '17 at 10:11
  • More helpers here: https://stackoverflow.com/questions/24092884/get-nth-character-of-a-string-in-swift-programming-language. – Martin R Jul 24 '17 at 10:53
  • 1
    ... on the other hand, I am not a fan of adding all these extensions just to "make it work with integer indices", that can be a performance bottleneck. See https://stackoverflow.com/questions/40371929/slow-swift-string-performance for an example. – Martin R Jul 24 '17 at 11:00

2 Answers2

3

I just maybe understand your question wrong but nevertheless here you can substring easily:

extension String {
    func substring(to: Int) -> String? {
        return self.substring(to: self.index(self.startIndex, offsetBy: to))
    }
} 
Gunhan
  • 6,807
  • 3
  • 43
  • 37
1

See Gunhan's answer to question 2.

For question 1:

Long Answer: Basically, in Swift, the Collection protocol (which types like Array and String adopt) specifies performance criteria. Most notably, the start and end indices of a collection must be accessible in O(1) time. This goes for subscript access of elements as well (see text under Expected Performance heading on https://developer.apple.com/documentation/swift/collection).

Indices are memory addresses that point to either another reference or value type. For string, the indices point to Character struct values. While the value of the characters in the strings let x = "hello" and let y = "hello" might be the same, the constants x and y are different memory addresses and so are the indices that point to their respective Character struct values. So, you cannot get the substring "llo" from x and the substring "llo" from y using the same indices... that would make no sense, because the indices you would use would be specific memory addresses to only of those variables.

Architecturally, there are three entities having interplay here:

String |--hasIndicesOfType--> String.Index |--whichPointTo--> Character

(Side note: Here, Index is an associatedtype on String, which is why you see the notation String.Index if you were to, say, log one of these to the Xcode console).

In a nutshell: the answer to why the function needs the string itself is because indices of a string are memory addresses to characters of the string. String conforms to Collection to get O(1) performance on access to String's elements. Thus, the indices. Because indices are memory addresses, they only work on a given object (reference type) - which is why generic access to characters of a string only work for one string (i.e. you cannot access string A's elements with memory addresses to string B's characters... that doesn't make sense.)

In a super tiny nutshell: because String conforms to Collection.

Some helpful tutorial-ish code:

Teleology: Apparent end-goal-reasoning (aka teleological analysis) behind why String is the way it is in Swift:

Contributors to the language preferred optimizing time complexity over space complexity.

Jacob M. Barnard
  • 1,347
  • 1
  • 10
  • 24