2

Suppose I want to get a specific character of a string in Python 2.7, suppose

a = 'abcdefg...' # a long string
print a[5]

Wondering when access any specific character of a string, for example, access the 5th element, wondering what is the performance, is it constant time O(1), or linear performance O(n) either according the 5 (the position of the character we are accessing), or linear performance O(n) to the whole string (len(a) in this example)?

bad_coder
  • 11,289
  • 20
  • 44
  • 72
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 1
    I like the idea that it might be linear performance on the length of the string. Like, accessing the first character of a huge string takes forever because it has to preload it all, or it's stored in some garbled dictionary/linked list thing :) Python has done weirder things I suppose... – en_Knight Jun 16 '16 at 03:46

2 Answers2

4
>>> long_string_1M ="".join(random.choice(string.printable) for _ in xrange(1000000))
>>> short_string = "hello"
>>> timeit.timeit(lambda:long_string_1M[50000])
0.1487280547441503
>>> timeit.timeit(lambda:short_string[4])
0.1368805315209798
>>> timeit.timeit(lambda:short_string[random.randint(0,4)])
1.7327393072888242
>>> timeit.timeit(lambda:long_string_1M[random.randint(50000,100000)])
1.779330312345877

looks like O(1) to me

they acheive it because a string is consecutive memory locations so indexing into it is simply a matter of offsetting ... there is no seek (at least that is my understanding) if you know c/c++ its something like *(pointer+offset) (its been a long time since ive done C so that might be a little wrong)

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Thanks Joran, vote up and I have the same feeling/result, and curious how it is implemented internally to have O(1) performance? Any detailed or high level idea is appreciated. :) – Lin Ma Jun 16 '16 at 03:30
  • 2
    I assumed it because of the details at https://wiki.python.org/moin/TimeComplexity ... although they do not callout string specifically it is close enough to a list ... – Joran Beasley Jun 16 '16 at 03:32
  • Nice reference Joran. Vote up, and will mark your reply as answer in 2 mins when SO allows me. You answered too fast. :) – Lin Ma Jun 16 '16 at 03:33
  • 1
    I agree - anything in that table that talks about tuples can probably be applied to strings (didn't look that carefully, but probably). There are a bunch of places where the python docs draw analogy and the implementation is similar – en_Knight Jun 16 '16 at 03:44
4

In addition to Joran's answer, I'd point you to this reference implementation, confirming his answer that it is O(1) lookup

/* String slice a[i:j] consists of characters a[i] ... a[j-1] */        
static PyObject *    
string_slice(register PyStringObject *a, register Py_ssize_t i,    
             register Py_ssize_t j)    
     /* j -- may be negative! */    
{    
    if (i < 0)    
        i = 0;    
    if (j < 0)    
        j = 0; /* Avoid signed/unsigned bug in next line */    
    if (j > Py_SIZE(a))    
        j = Py_SIZE(a);    
    if (i == 0 && j == Py_SIZE(a) && PyString_CheckExact(a)) {    
        /* It's the same as a */    
        Py_INCREF(a);    
        return (PyObject *)a;    
    }    
    if (j < i)  
        j = i;    
    return PyString_FromStringAndSize(a->ob_sval + i, j-i);    
}

Why this should be your intuition

Python strings are immutable. This common optimization allows tricks like assuming contiguous data when desirable. Note that under the hood, we sometimes just need to compute the offset from the memory location in C (obviously implementation specific)

There are several places where the immutability of strings is something that can be relied on (or vexed by). In the python author's words;

There are several advantages [to strings being immutable]. One is performance: knowing that a string is immutable means we can allocate space for it at creation time

So although we may not be able to guarantee, as far as I know, this behaviour across implementations, it's awfully safe to assume.

Community
  • 1
  • 1
en_Knight
  • 5,301
  • 2
  • 26
  • 46
  • 2
    lol ... great minds think alike ... (see my edit) nice link to ref +1 from me – Joran Beasley Jun 16 '16 at 03:36
  • 2
    @Joran haha ya +1 on yours too , that edit came in as I was typing mine up. I was hoping you weren't going to throw in my link otherwise I would have to cancel the post – en_Knight Jun 16 '16 at 03:39
  • @IharBury thanks for the feedback on an old question :) Why is PyString_FromStringAndSize not O(1)? Looking at the same reference implementation, I don't see any "loops"; there is a malloc call, is that what you consider to be "not O(1)"? I don't usually consider those proportional to the size of the string at all in, which seems consistent with other SO answers: https://stackoverflow.com/questions/282926/time-complexity-of-memory-allocation, https://cs.stackexchange.com/questions/83834/what-is-the-time-complexity-of-memory-allocation-assumed-to-be – en_Knight Oct 03 '19 at 16:15
  • @IharBury hmm where does that happen *on the entire string*? It's true that it might copy the slice of the string, but the OP Is asking about accessing a single character, so that would be a memcpy on one element (and the reference has a specific case, it looks to me, to *not* memcpy for size-1 objects, but intern them (line 104). Note that j-1==1 for the OP's case, we're not calling PyString_FromStringAndSize on the entire initial object – en_Knight Oct 05 '19 at 20:48