Comparing efficiency of two substring searching methods in Python

Question

After searching the topic of substring searching in python (link1, link2) I found two obvious solutions

str1 = 'Hi there'
str2 = 'Good bye'
# 1
if str1.find('Hi') != -1: 
    print 'Success!'
# 2
if 'Good' in str2:
    print 'Success'

Is there a difference in the generated code by those two or the second one is just syntactic sugar ?
Is one or the other more efficient?
Is there a third option

in terms of complexity they are both `O(n)` use the one that makes sense for a given senario ... if you want to get an index use `find` ... if you want to test membership use `in` — Joran Beasley, Feb 16 '14 at 21:23

score 2 · Accepted Answer · answered Feb 16 '14 at 21:33

You can check what the bytecode looks like for those conditions:

In [1]: import dis

In [2]: dis.dis(lambda: 'Hi' in x)
  1           0 LOAD_CONST               1 ('Hi') 
              3 LOAD_GLOBAL              0 (x) 
              6 COMPARE_OP               6 (in) 
              9 RETURN_VALUE         

In [3]: dis.dis(lambda: x.find('Hi') != -1)
  1           0 LOAD_GLOBAL              0 (x) 
              3 LOAD_ATTR                1 (find) 
              6 LOAD_CONST               1 ('Hi') 
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             12 LOAD_CONST               3 (-1) 
             15 COMPARE_OP               3 (!=) 
             18 RETURN_VALUE

As you can see the find versions does a lot more, in particular it is doing an attribute lookup which is not needed for the in operator.

I must also say that in makes it more explicit that you are checking for the presence of a substring and not of its position, and thus it is more readable.

In terms of speed they should be perfectly equal for any reasonable size of strings. Only for the smallest strings the attribute lookup has a significant impact, but in that case the condition is checked very fast anyway.

The third option would be to use index and catch the exception:

try:
    string.index(substring)
except IndexError:
    # not found
else:
    # found

Although this cannot be expressed as a simple expression.

rlms · Answer 2 · 2014-02-17T10:07:32.907

0

The second isn't just syntactic sugar for the first. str.find is just a method call, whereas a in b calls a.__contains__(b). I don't think there are any differences in speed.

I would recommend the second, as it is more Pythonic:

It is more readable. It uses duck typing. The string could be replaced by a different iterable, and it would still work.

edited Feb 17 '14 at 10:07

answered Feb 16 '14 at 21:30

rlms

10,650
8
44
61

Comparing efficiency of two substring searching methods in Python

2 Answers2