TL;DR: Just because string concatenation is optimized under certain circumstances doesn't mean it's necessarily O(1)
, it's just not always O(n)
. What determines the performance is ultimatly your system and it could be smart (beware!). Lists that "garantuee" amortized O(1)
append operations are still much faster and better at avoiding reallocations.
This is an extremly complicated problem, because it's hard to "measure quantitativly". If you read the announcement:
String concatenations in statements of the form s = s + "abc"
and s += "abc"
are now performed more efficiently in certain circumstances.
If you take a closer look at it then you'll note that it mentions "certain circumstances". The tricky thing is to find out what these certain cirumstances are. One is immediatly obvious:
- If something else holds a reference to the original string.
Otherwise it wouldn't be safe to change s
.
But another condition is:
- If the system can do the reallocation in
O(1)
- that means without needing to copy the contents of the string to a new location.
That's were it get's tricky. Because the system is responsible for doing a reallocation. That's nothing you can control from within python. However your system is smart. That means in many cases you can actually do the reallocation without needing to copy the contents. You might want to take a look at @TimPeters answer, that explains some of it in more details.
I'll approach this problem from an experimentalists point of view.
You can easily check how many reallocations actually need a copy by checking how often the ID changes (because the id
function in CPython returns the memory adress):
changes = []
s = ''
changes.append((0, id(s)))
for i in range(10000):
s += 'a'
if id(s) != changes[-1][1]:
changes.append((len(s), id(s)))
print(len(changes))
This gives a different number each run (or almost each run). It's somewhere around 500 on my computer. Even for range(10000000)
it's just 5000 on my computer.
But if you think that's really good at "avoiding" copies you're wrong. If you compare it to the number of resizes a list
needs (list
s over-allocate intentionally so append
is amortized O(1)
):
import sys
changes = []
s = []
changes.append((0, sys.getsizeof(s)))
for i in range(10000000):
s.append(1)
if sys.getsizeof(s) != changes[-1][1]:
changes.append((len(s), sys.getsizeof(s)))
len(changes)
That only needs 105 reallocations (always).
I mentioned that realloc
could be smart and I intentionally kept the "sizes" when the reallocs happened in a list. Many C allocators try to avoid memory fragmentation and at least on my computer the allocator does something different depending on the current size:
# changes is the one from the 10 million character run
%matplotlib notebook # requires IPython!
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
#ax.plot(sizes, num_changes, label='str')
ax.scatter(np.arange(len(changes)-1),
np.diff([i[0] for i in changes]), # plotting the difference!
s=5, c='red',
label='measured')
ax.plot(np.arange(len(changes)-1),
[8]*(len(changes)-1),
ls='dashed', c='black',
label='8 bytes')
ax.plot(np.arange(len(changes)-1),
[4096]*(len(changes)-1),
ls='dotted', c='black',
label='4096 bytes')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('x-th copy')
ax.set_ylabel('characters added before a copy is needed')
ax.legend()
plt.tight_layout()

Note that the x-axis represents the number of "copies done" not the size of the string!
That's graph was actually very interesting for me, because it shows clear patterns: For small arrays (up to 465 elements) the steps are constant. It needs to reallocate for every 8 elements added. Then it needs to actually allocate a new array for every character added and then at roughly 940 all bets are off until (roughly) one million elements. Then it seems it allocates in blocks of 4096 bytes.
My guess is that the C allocator uses different allocation schemes for differently sized objects. Small objects are allocated in blocks of 8 bytes, then for bigger-than-small-but-still-small arrays it stops to overallocate and then for medium sized arrays it probably positions them where they "may fit". Then for huge (comparativly speaking) arrays it allocates in blocks of 4096 bytes.
I guess the 8byte and 4096 bytes aren't random. 8 bytes is the size of an int64
(or float64
aka double
) and I'm on a 64bit computer with python compiled for 64bits. And 4096 is the page size of my computer. I assume there are lots of "objects" that need have these sizes so it makes sense that the compiler uses these sizes because it could avoid memory fragmentation.
You probably know but just to make sure: For O(1)
(amortized) append behaviour the overallocation must depend on the size. If the overallocation is constant it will be O(n**2)
(the greater the overallocation the smaller the constant factor but it's still quadratic).
So on my computer the runtime behaviour will be always O(n**2)
except for lengths (roughly) 1 000 to 1 000 000 - there it really seems to undefined. In my test run it was able to add many (ten-)thousand elements without ever needing a copy so it would probably "look like O(1)
" when timed.
Note that this is just my system. It could look totally different on another computer or even with another compiler on my computer. Don't take these too seriously. I provided the code to do the plots yourself, so you can analyze your system yourself.
You also asked the question (in the comments) if there would be downsides if you over-allocate strings. That's really easy: Strings are immutable. So any overallocated byte is wasting ressources. There are only a few limited cases where it really does grow and these are considered implementation details. The developers probably don't throw away space to make implementation details perform better, some python developers also think that adding this optimization was a bad idea.