1

What is the time complexity of Python's for in iteration construction with a string?

E.g.,

for s in 'test':
   ...  # s = 't', 'e', 's', 't'

What is the total runtime of the loop?

Edit: I see I was confusing Python's string slice lookup with string iteration. Its index lookup is O(1) and it iterates at O(1), so the total loop should be O(n), same as a list.

Union find
  • 7,759
  • 13
  • 60
  • 111
  • 1
    "_The lookup time of a single index in a string is O(n)_". String indexing [should be O(1)](https://stackoverflow.com/a/41173143/5648954) – Nick Parsons Jan 11 '20 at 03:56
  • 1
    I see, thanks. Then the loop is O(n). Getting confused by Python string immutability. – Union find Jan 11 '20 at 03:58
  • @NickParsons Were you just commenting on that quote, or do you think that's relevant for the question? I don't think it is. This is not an index loop. Even if indexing wasn't O(1) because for example Python used variable-size characters or a linked list of characters, the *iterator* could still go through the string in O(n) because it remembers where it's at. – Kelly Bundy Jan 11 '20 at 04:28
  • @HeapOverflow it was more just a general observation. I'm not a python expert so I'm not aware of how an iterator for a string is implemented. But, if the iterator's next method were to use an index to get the next value and that index loop-up took O(n) then I would imagine the t.c to be O(n^2). – Nick Parsons Jan 11 '20 at 06:14
  • 1
    @NickParsons If you're interested about the iterator's implementation, see my answer's new section "Back to time complexity of string iteration". Summary: It does use an index, but if that wasn't O(1), I'm certain the iterator would use something else instead that would be O(1), as it has the advantage of keeping its state. So the whole iteration would still be O(n). But yeah, *if* index lookup took O(n) *and* the iterator used that, the iteration would take O(n^2). Though for that to happen, the Python implementers would need to be total noobs :-) – Kelly Bundy Jan 11 '20 at 16:08

2 Answers2

3

It's O(n), but the index lookup argument is a red herring.

How iteration works

Index lookup speed would matter if you did this:

for index in range(len(mystring)):
    char = mystring[index]
    ...

But you're not using indexes. You're using an iterator, more precisely a string iterator:

>>> iter('test')
<str_iterator object at 0x03569820>

That iterator remembers where it is in the string (any way it likes, doesn't need to be an "index"). And can be repeatedly asked for the next character:

>>> it = iter('test')
>>> next(it)
't'
>>> next(it)
'e'
>>> next(it)
's'
>>> next(it)
't'
>>> next(it)
Traceback (most recent call last):
  File "<pyshell#200>", line 1, in <module>
    next(it)
StopIteration

That's what the for-loop does. It creates that iterator and then repeatedly asks it for the next value until the iterator tells it to stop. And every value it gets from the iterator, it gives to your code in the variable you named. In other words, the for-loop is really just a middle-man between the iterator and your code in the loop body.

In contrast to strings, imagine a simple linked list. Index lookup in the linked list takes O(n), as every lookup requires walking from the start of the linked list to the desired node. But you can still easily do a full iteration in O(n), right? And an iterator object would keep a reference to the next node, so it would provide it in O(1) time (and then move its reference forward). So for a linked list, a for-loop using indexes would take O(n2) but a normal pythonic for-loop (implicitly using a linked list iterator) would be O(n).

You can even mimic a for-loop with a while-loop and an explicit iterator you handle yourself instead of letting a for-loop handle it for you. Instead of

for char in 'test':
    print(char)

do this:

it = iter('test')
while True:
    try:
        char = next(it)
    except StopIteration:
        break
    print(char)

That prints this:

t
e
s
t

Back to time complexity of string iteration

Let's look at the source code. I'm not super familiar with it, but I'll describe what I believe. Remember it's a str_iterator? What's str in Python 3 is called unicode in Python 2, and that's still the name in the C source code of Python 3. So in unicodeobject.c we find the string "str_iterator", and it's in the "Unicode Iterator" section. Excerpts:

/********************* Unicode Iterator **************************/

typedef struct {
    ...
    Py_ssize_t it_index;
    PyObject *it_seq;    /* Set to NULL when iterator is exhausted */
} unicodeiterobject;
...
unicodeiter_next(unicodeiterobject *it)
{
    ...
    seq = it->it_seq;
      ...
        void *data = PyUnicode_DATA(seq);
        Py_UCS4 chr = PyUnicode_READ(kind, data, it->it_index);
        item = PyUnicode_FromOrdinal(chr);
        if (item != NULL)
            ++it->it_index;
        return item;
    ...
}
...
PyTypeObject PyUnicodeIter_Type = {
    ...
    "str_iterator",         /* tp_name */
   ...
};

So it's a unicodeiterobject with a pointer it_seq to the string it iterates and an index it_index. And its next function uses them to get the next character, increases the index, and returns the character. Ok, turns out the iterator does use an index internally. But that indexing is at a lower, more direct internal level than the indexing you do from Python, which uses the unicode_getitem function:

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
    void *data;
    enum PyUnicode_Kind kind;
    Py_UCS4 ch;
    ...
    if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
        PyErr_SetString(PyExc_IndexError, "string index out of range");
        return NULL;
    }
    kind = PyUnicode_KIND(self);
    data = PyUnicode_DATA(self);
    ch = PyUnicode_READ(kind, data, index);
    return unicode_char(ch);
}

Both look similar and in the end use PyUnicode_READ(kind, data, index). I couldn't find that one, but it should be fairly simple and O(1), making the whole iteration O(n).

One more thing: The answer/question that @NickParsons pointed out above deal with the worry of Python using variable-size multibyte character representations, which could make index lookups O(n) instead of O(1). Even if that were the case, it would only affect the unicode_getitem function. Not the str_iterator iterator. Because the iterator would absolutely certainly not use a naive "string index" then, but a pointer to the first byte of the next character, so that it could read and advance in O(1). So its whole iteration would still be O(n).

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • Wish I'd accepted this. Is there a way to reveal what the string iterator does under the hood? I will try some experiments today. – Union find Jan 11 '20 at 13:23
  • 1
    @Learningstatsbyexample Done, see the added section "Back to time complexity of string iteration". – Kelly Bundy Jan 11 '20 at 15:51
  • Amazing. Feel free to upvote the question.. this may get deleted and your response will go away. This is so helpful. – Union find Jan 11 '20 at 15:54
2

O(n) is the answer. check this website for learn more about timeComplexity python time complexity