2

In C++ if I were to remove the first character from a string it would look something like:

string s = "myreallylongstring";
s = s.substr(1);

and this would be O(1). [correct me if I'm wrong about that]

However in Python's "immutable string" world, does this code run in O(n)?

s = "myreallylongstring"
s = s[1:]

Would it be any faster if I used a list of chars instead?

David McNamee
  • 403
  • 4
  • 10
  • 5
    `substr` in C++ is not O(1). A new string is constructed. – cigien Sep 25 '21 at 01:46
  • 1
    `substr` is `O(n)`. Even if you know `n` happens to be `1` in this case, it is still `O(n)` complexity. Big-O notation describes complexity growth as `n` tends towards infinity. A big-O complexity of `O(1)` strictly means that complexity is independent of the given variable. – François Andrieux Sep 25 '21 at 01:50
  • `std::basic_string::substr()` has complexity which is linear in the length of the substring being extracted. Extracting all except the first character would mean complexity O(length) where `length` is the original length of the string. `std::basic_string::operator=()` is also linear in the length of string being assigned. Net effect: `s=s.substr(1)` is O(length). – Peter Sep 25 '21 at 01:52
  • 1
    @FrançoisAndrieux The `1` in `s.substr(1)` is the index of the first character to be extracted, not the length of the string being extracted. – Peter Sep 25 '21 at 01:54
  • Note: The title question here is a duplicate of [Time complexity of string slice](https://stackoverflow.com/q/35180377/364696) (which I apparently answered, but forgot about in the last 3 normal years + 1.5 COVID years, the latter of which count as 15 regular years when it comes to remembering the before times). The extra questions (is the OP right about C++ `substr` being faster on big-O, would a Python `list` be more efficient) are not. I won't use my C++ & Python dupehammer, but I won't object if someone else thinks it's a dupe. – ShadowRanger Sep 25 '21 at 03:25

2 Answers2

5

Slicing any built-in type in Python (aside from memoryview) is O(n) in the general case (main exception being a complete slice of immutable instances, which usually returns the original instance without copying it, since it's not needed).

A list of characters wouldn't help; removing a character from the beginning of a list is O(n) (it has to copy everything above it down a slot). A collections.deque could conceivably improve the big-O (they can do O(1) pops from either end), but it would also consume substantially more memory, and more fragmented memory, than the string.

For all but the largest strings, you're usually okay even with these O(n) inefficiencies, so unless you've actually profiled and found it to be a cause of problems, I'd stick with the slicing and let it be.

That said, you're wrong about C++; s = s.substr(1) is no different from Python's s = s[1:]. Both of them end up copying the entire string save the first character, C++ move assigns back to the original string, while Python replaces the original name binding to the old object with the new one (functionally pretty similar operations). s.substr(1) isn't even using std::string's mutability features; s.erase(0, 1) would in fact erase in-place, mutating the original string, but it would still be O(n) because all the other characters would have to be copied down into the space previously consumed by the deleted characters (no std::string implementation that I know of allows for the storage of an offset from the beginning of the string to find the real data, the pointer to the data points to the first allocated byte at all times).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • There's another built-in type that slices in O(1), Shadow**Range**r. – no comment Sep 25 '21 at 19:41
  • @don'ttalkjustcode: Hah, you're correct. I use that feature myself (easier to define `range(10)[::-1]` than `range(9, -1, -1)`), but it's such a limited purpose type with so little flexibility (and if you use `enumerate`/`zip`/`itertools` appropriately, almost never needed) that I don't think of it much. :-) – ShadowRanger Sep 26 '21 at 01:17
  • Yeah, I've used `[::-1]` on them before as well, but might've used other slicings on them just as often as I've used `memoryview`, i.e., never (other than testing whether it indeed works). – no comment Sep 26 '21 at 10:58
3

this answer adresses the C++ part of the question

s = s.substr(1);

Creates a new temporary string and copies it to s. The operation is O(n).

A better way to do it is:

s.erase(s.begin());

This is still O(n) but avoid creating a new object and the copying.

A better way is to use std::string_view:

auto sv = std::string_view{s.begin() + 1, s.end()}.

This is O(1) as it doesn't create or alter any string, it just creates a view into an existing string.

bolov
  • 72,283
  • 15
  • 145
  • 224
  • This is not an answer to the OP's question about Python, just correcting their misconception about C++. Also, does `std::string` have a `remove` method? I just see `erase` in [the docs](https://en.cppreference.com/w/cpp/string/basic_string). `erase` could be used with `s.begin()` the way you're doing it, or with `0, 1` (for delete from index 0 for 1 character). – ShadowRanger Sep 25 '21 at 01:57
  • @ShadowRanger yes, I indeed talk just about the C++ part. Yes, `remove` was a mistake, it's `erase`. – bolov Sep 25 '21 at 03:26