15

What is the big Oh notation for str.replace function in Python ?

Is it always O(n) ?

str = "this is string example"
print str.replace("is", "was")
thwas was string example
Vincent Savard
  • 34,979
  • 10
  • 68
  • 73
Mohamad Zein
  • 777
  • 1
  • 8
  • 33

3 Answers3

12

Big O notation is calculated at worst-case scenario, and Python sources for worst case do just 'find next position of substr, replace, and go further'. One replacement does O(n) operations (copying the string). One search, according to http://effbot.org/zone/stringlib.htm, in worst-case scenario does O(n*m) operations. And since it can be up to n/m replacements, in total it should be surprisingly O(n*n).

Nickolay Olshevsky
  • 13,706
  • 1
  • 34
  • 48
  • 1
    If there are n/m replacements, do you really think each takes nm time to be found? I think it's just m time. I think overall time complexity is O(nm), not just O(nn). – superb rain Aug 11 '20 at 17:05
  • 2
    This analysis is wrong. First, big O it is just a notation for mathematical functions. When applied to algorithms we can mean worst, average or best case. Secondly, the worst case search can be `O(k*m)` where `k` is the number of letters you go through to find the next match. Therefore finding all matches is `O(n*m)` worst case. Average is `O(n)`. If you get to use Boyer-Moore, the search is `O(n/m)`. (Sadly, unicode and Boyer-Moore are not friends.) – btilly Feb 11 '21 at 22:33
  • "we can mean worst, average or best case" – while *technically* true, in almost any non-academic setting, people tend to use big O only to mean worst-case. I do love when people inquire about average growth, though! – Leland Sep 16 '22 at 20:46
7

I coded up a test for what I believe is the worst case scenario - a string repeated over and over, and we're replacing said string with another string. Because t/n levels off as n increases, worst case scenario seems empirically like it may be O(n). But I really can't argue with @NickolayOlshevsky 's post.

import time
from matplotlib import pyplot as plt

x=[10]
while x[-1]<10**8:
    x.append(int(x[len(x)-1]*1.5))

y = [0]*len(x)

nst = 'abcd'
sst = 'abcd'

for ix,i in enumerate(x):
    s = ''.join([nst]*i)
    t = time.time()
    s = s.replace(sst,'efgh')
    y[ix] = time.time()-t

x = [a*len(nst) for a in x]

%matplotlib inline
fig, (ax1,ax2) = plt.subplots(2, sharex=True)
fig.set_size_inches(8, 6)
ax1.set_xscale('log')
ax1.set_yscale('log')
ax1.set_xlabel('n')
ax1.set_ylabel('t')
ax1.plot(x,y)
ax2.set_xscale('log')
ax2.set_yscale('log')
ax2.set_xlabel('n')
ax2.set_ylabel('t/n')
ax2.plot(x,[a/b for a,b in zip(x,y)])

n vs t

Charlie Haley
  • 4,152
  • 4
  • 22
  • 36
  • Hm. In such scenario searching for the next occurrence of the string will be (almost) constant - since it goes right after the previous occurrence. And since string being searched is small then it would be O(n). Worse scenario should be with long substring + main string with substrings which 'almost' match. I.e. l123121212121212123 while searching for 123. – Nickolay Olshevsky Feb 23 '16 at 19:54
  • Ah, logical. Or even worse potentially, searching for `12121212123` in a string like `12121212121212121212123`. I tested it out, and doing this unfortunately has little to no effect on the plot. – Charlie Haley Feb 23 '16 at 20:19
1

While technically, in mathematics, Big O notation is used to capture best, average and worst case runtimes of a function...in the programming world, worst case or upper bound is generally what is being considered in regards to Big O runtime, because it is the greatest concern for scalability and because it's brief. While this may be incorrect usage of the term or may seem vague, it's just become a quicker way in the industry to address this concern or reference worst case.

If you look at the Python 3.11 source code (Python replace method), you may see this has an average case of O(N) and a worst case of O(N+M), where N is the length of the string and M is number of occurrences of the substring we're looking to replace. The method was updated in v3.10 to use heuristics to choose between the better of two search algorithms. This post explains it in more detail. Note: it addresses str.replace() and several other string methods.

General Grievance
  • 4,555
  • 31
  • 31
  • 45
skilbur
  • 11
  • 2