It's O(n), but the index lookup argument is a red herring.
How iteration works
Index lookup speed would matter if you did this:
for index in range(len(mystring)):
char = mystring[index]
...
But you're not using indexes. You're using an iterator, more precisely a string iterator:
>>> iter('test')
<str_iterator object at 0x03569820>
That iterator remembers where it is in the string (any way it likes, doesn't need to be an "index"). And can be repeatedly asked for the next character:
>>> it = iter('test')
>>> next(it)
't'
>>> next(it)
'e'
>>> next(it)
's'
>>> next(it)
't'
>>> next(it)
Traceback (most recent call last):
File "<pyshell#200>", line 1, in <module>
next(it)
StopIteration
That's what the for
-loop does. It creates that iterator and then repeatedly asks it for the next value until the iterator tells it to stop. And every value it gets from the iterator, it gives to your code in the variable you named. In other words, the for
-loop is really just a middle-man between the iterator and your code in the loop body.
In contrast to strings, imagine a simple linked list. Index lookup in the linked list takes O(n), as every lookup requires walking from the start of the linked list to the desired node. But you can still easily do a full iteration in O(n), right? And an iterator object would keep a reference to the next node, so it would provide it in O(1) time (and then move its reference forward). So for a linked list, a for
-loop using indexes would take O(n2) but a normal pythonic for
-loop (implicitly using a linked list iterator) would be O(n).
You can even mimic a for
-loop with a while
-loop and an explicit iterator you handle yourself instead of letting a for
-loop handle it for you. Instead of
for char in 'test':
print(char)
do this:
it = iter('test')
while True:
try:
char = next(it)
except StopIteration:
break
print(char)
That prints this:
t
e
s
t
Back to time complexity of string iteration
Let's look at the source code. I'm not super familiar with it, but I'll describe what I believe. Remember it's a str_iterator
? What's str
in Python 3 is called unicode
in Python 2, and that's still the name in the C source code of Python 3. So in unicodeobject.c
we find the string "str_iterator"
, and it's in the "Unicode Iterator" section. Excerpts:
/********************* Unicode Iterator **************************/
typedef struct {
...
Py_ssize_t it_index;
PyObject *it_seq; /* Set to NULL when iterator is exhausted */
} unicodeiterobject;
...
unicodeiter_next(unicodeiterobject *it)
{
...
seq = it->it_seq;
...
void *data = PyUnicode_DATA(seq);
Py_UCS4 chr = PyUnicode_READ(kind, data, it->it_index);
item = PyUnicode_FromOrdinal(chr);
if (item != NULL)
++it->it_index;
return item;
...
}
...
PyTypeObject PyUnicodeIter_Type = {
...
"str_iterator", /* tp_name */
...
};
So it's a unicodeiterobject
with a pointer it_seq
to the string it iterates and an index it_index
. And its next
function uses them to get the next character, increases the index, and returns the character. Ok, turns out the iterator does use an index internally. But that indexing is at a lower, more direct internal level than the indexing you do from Python, which uses the unicode_getitem
function:
static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
void *data;
enum PyUnicode_Kind kind;
Py_UCS4 ch;
...
if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
PyErr_SetString(PyExc_IndexError, "string index out of range");
return NULL;
}
kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return unicode_char(ch);
}
Both look similar and in the end use PyUnicode_READ(kind, data, index)
. I couldn't find that one, but it should be fairly simple and O(1), making the whole iteration O(n).
One more thing: The answer/question that @NickParsons pointed out above deal with the worry of Python using variable-size multibyte character representations, which could make index lookups O(n) instead of O(1). Even if that were the case, it would only affect the unicode_getitem
function. Not the str_iterator
iterator. Because the iterator would absolutely certainly not use a naive "string index" then, but a pointer to the first byte of the next character, so that it could read and advance in O(1). So its whole iteration would still be O(n).