2

I have the following code:

>>> s = "stackoverflow"
>>> tmp = s[0]
>>> s = s[1:]
>>> s
'tackoverflow'

I am performing this "remove first" sort of operation repeatedly and am trying to determine how it affects performance.

Is a single slice operation like the one shown above O(1)? If it is not O(1), is there a way that I can perform the operation with a complexity of O(1)?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Coder
  • 1,175
  • 1
  • 12
  • 32
  • Removing first char needs to move memory of whole string one byte to the left. So it needs O(n) time, if n is length of string. – Arty Dec 27 '20 at 11:52
  • To make complexity O(1) specifically for this operation you need to have special String implementation, which keeps original (unmodified) array and just changes begin/end pointer. This is called lightweight array slicing. Unfortunately Python string doesn't have such lightweight slicing operation, it needs to move whole array. But Numpy arrays do such light slicing, so try to use them, [here is the doc](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html). – Arty Dec 27 '20 at 11:55
  • 1
    Does this answer your question? [Time complexity of string slice](https://stackoverflow.com/questions/35180377/time-complexity-of-string-slice) – flyingdutchman Dec 27 '20 at 11:59
  • that answers the time complexity, which wasnt the main point of the question — I had already figured that and was asking for confirmation, with the purpose of **how can I make it O(1)** – Coder Dec 27 '20 at 12:15
  • @JohnD To make it O(1) it is not enough to use just python's `str()`, because with str it will be always O(n). You need some special implementation for that, special in sence that besides char array your implementation Class should store also `begin`, `end` and maybe `stride`. At the beginning begin=0, end=n, and after slicing s[1:] will be begin=1, end=n. I.e. such Class will keep whole original array of chars, but after slicing just move-update begin/end pointers. – Arty Dec 27 '20 at 14:54
  • If you don't need the intermediate string values, `for tmp in reversed(s):` would be O(1) for each `tmp`, and O(n) for the full for loop. – tdelaney Dec 27 '20 at 15:42
  • @tdelaney i do need the intermediate values but i think that does work with saving intermediate values isnt `x=s[0]` ∈ O(1)? – Coder Dec 28 '20 at 03:31
  • @John - Yes. I was also toying with `s = list(reversed(s))` and then you could `s.pop()` from the right side so there isn't a copy. Not sure that's any better. – tdelaney Dec 28 '20 at 04:34
  • hmm i tried that but seems like you cant do `s.pop()` if `s` is of type `str`, only if `s` is of type `list` – Coder Dec 28 '20 at 10:17

1 Answers1

3

Python's str() does slicing s[1:] in O(n) time, because str() is immutable and any slicing operations create whole new string and copies all slice data, so for string of length N this operations is O(N).

To do any slicing in O(1) time you need to use special implementations, like Numpy has for its arrays. Numpy arrays keep pointer to allocated array plus three extra fields begin, end, stride.

Before slicing Numpy's array having N elements will start with begin=0, end=N. After slicing e.g. a[1:], unlike str() Numpy array doesn't copy any memory but just changes begin/end/stride. Memory pointer and size remain the same. Thus after a[1:] we will have begin=1, end=N. After one more a[1:] we will have begin=2, end=N.

For strings you can convert them one time at program beginning to Numpy arrays of uint32 elements (because Unicode code point is 32-bit at current time). Then do all the operations inside your program only in Numpy arrays. Numpy also has many string operations implemented for its arrays.

You can do slicing and many other operations very fast, much faster than working with Python's str(). And all slicing operations are done in O(1) time. After you're done with Numpy operations you convert back to Python's str().

Next follows example of code. It first converts Python str() to numpy array. Then does any operations including slicing (slicing is done in O(1) time on numpy arrays). And when we're finished numpy array is converted back to Python's str().

Try it online!

import numpy as np
s = 'abcde'
# Convert string to Numpy array
a = np.frombuffer(s.encode('utf-32-le'), dtype = np.uint32)
# Do any slicing operations in O(1) time
a = a[1:]
a = a[1:]
# Convert Numpy array to string
s = a.tobytes().decode('utf-32-le')
print(s)

Output:

cde
Arty
  • 14,883
  • 6
  • 36
  • 69
  • 1
    thanks! unfortunate that no native solution exists without numpy but this definitely answers my question, accepted! – Coder Dec 28 '20 at 03:23