-2

In both statements, I am appending a character "a" to the string s:

  1. s += "a"
  2. s = s + "a"

Which statement has the better time complexity in Python?

ruohola
  • 21,987
  • 6
  • 62
  • 97

2 Answers2

8

They have the same time complexity.

In the general Python standard defined case: They both have the same time complexity of O(n), where n is the length of string s.

In practice, with the CPython implementation: They can in some cases both be O(1), because of an optimization that the interpreter can do when it detects that s is the only reference to the string in question.

Demo (using Python 3.10.1):

O(1) (optimization in play):

String of length 10⁹, using +=:

$ python -m timeit --setup='s = "s" * 10**9' 's += "a"'
5000000 loops, best of 5: 96.6 nsec per loop

String of length 10⁹, using +:

$ python -m timeit --setup='s = "s" * 10**9' 's = s + "a"'
5000000 loops, best of 5: 95.5 nsec per loop

String of length 1, using +=:

$ python -m timeit --setup='s = "s"' 's += "a"'
5000000 loops, best of 5: 97 nsec per loop

String of length 1, using +:

$ python -m timeit --setup='s = "s"' 's = s + "a"'
5000000 loops, best of 5: 97.9 nsec per loop

O(n) (optimization doesn't apply):

String of length 10⁹, optimization doesn't apply, using +=:

$ python -m timeit --setup='s = "s" * 10**9; b = s' 's += "a"'
1 loop, best of 5: 440 msec per loop

String of length 10⁹, optimization doesn't apply, using +:

$ python -m timeit --setup='s = "s" * 10**9; b = s' 's = s + "a"'
1 loop, best of 5: 445 msec per loop

String of length 1, optimization doesn't apply, using +=:

$ python -m timeit --setup='s = "s"; b = s' 's += "a"'
5000000 loops, best of 5: 85.8 nsec per loop

String of length 1, optimization doesn't apply, using +:

$ python -m timeit --setup='s = "s"; b = s' 's = s + "a"'
5000000 loops, best of 5: 84.8 nsec per loop

More info about the time complexity of string concatenation: https://stackoverflow.com/a/37133870/9835872 https://stackoverflow.com/a/34008199/9835872

ruohola
  • 21,987
  • 6
  • 62
  • 97
  • 1
    A note: The optimization in CPython isn't actually in `str.__add__` or `str.__iadd__` (the latter doesn't even exist; `__add__` is seamlessly used when `+=` is used and `__iadd__` doesn't exist). The optimization is baked into the interpreter's bytecode evaluation loop which checks for update-in-place cases, performs some evil reference manipulation if so, then devolves to `PyUnicode_Append`, which recognizes when it can modify in place (only because the interpreter cleared the target reference first). The *actual* `str.__add__` doesn't use `PyUnicode_Append`, and *can't* use the optimization. – ShadowRanger Dec 27 '21 at 20:39
  • Thanks @ShadowRanger, fixed my answer a bit! Really interesting stuff – ruohola Dec 27 '21 at 20:41
  • 1
    You're welcome! Note: Even that explanation will change in the future; it's accurate as of 3.10, but it looks like 3.11 will do some *crazy* stuff with runtime adaptive bytecodes and the like (`BINARY_ADD` goes away, in favor of `BINARY_OP_ADD_INT`, `BINARY_OP_ADD_FLOAT` and `BINARY_OP_ADD_UNICODE`, which can all devolve to a plain `BINARY_OP` pathway when exact types don't match), and it looks like, at least at time of writing, they completely removed hack to operate in place (even for `BINARY_OP_ADD_UNICODE`), so this code may get *much* slower in 3.11 unless they add the hack back in. – ShadowRanger Dec 27 '21 at 20:46
  • 1
    Looks like the optimization still exists, I just missed the addition of `BINARY_OP_INPLACE_ADD_UNICODE`, which uses the same hackery as the older code path, so the performance should remain the same (or be slightly better by only even trying it when the code path is one that can actually trigger the optimization). The optimization does look like it might be disabled for the `s = s + "a"` case though (unless the bytecode compiler actually recognizes that pattern and replaces it with `s += "a"`, but that seems stupid dangerous since it would change behavior for non-strings). – ShadowRanger Dec 27 '21 at 20:59
1

Strings in Python are immutable, so whenever you "append" to s, Python makes a copy of s and appends the new character, effectively taking O(n) time complexity for both.

imaique
  • 60
  • 2