0

I am wondering, why when looking for a specific string in a larger string in python, looking in a string is much faster than a list.

str_test = "some string words "*100
def search_in_string():
    if "with" in (str_test):
        return True
    
ls_test = ["some", "string" "words"]*100
def search_in_list():
    if "with" in (ls_test):
        return True

import timeit
print(timeit.timeit(search_in_string))
### 0.3497438999984297   

print(timeit.timeit(search_in_list))
### 2.4319190999995044

It looks like the search in string is almost 7 times faster

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
cLwill
  • 102
  • 6
  • Check https://stackoverflow.com/questions/13884177/complexity-of-in-operator-in-python –  Aug 24 '21 at 16:47
  • 1
    [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) – Cory Kramer Aug 24 '21 at 16:47
  • 1
    @CoryKramer I raise. ["Ha ha, nested for loops go brrrrrrrrrrrr…"](https://ericlippert.com/2020/03/27/new-grad-vs-senior-dev/) – Alexander Aug 24 '21 at 16:48
  • 1
    Your question title and your question body don't match. Your title is asking which is faster, but the body is asserting that the substring-searching is faster, and asking for an explanation. What are you trying to ask? – Alexander Aug 24 '21 at 16:58
  • The second is going through all items in a list and comparing equality of each (as others have said, it's basically nested for loops - go over every element in that list and compare it character-by-character to `i`). The first is using substring checking which can be done (and is probably done) in linear fasion (hashing). – h4z3 Aug 24 '21 at 17:04
  • @KellyBundy: For a simple comparison, I just matched up two snippets with IPython `%timeit` magic: `i = 'foo'`, `s = "some string with words " * 1000`, `lst = "some string with words".split() * 1000`, then `%timeit i in s` and `%timeit i in lst`. The check in the `list` took nearly 10x as long. – ShadowRanger Aug 24 '21 at 17:06
  • @ShadowRanger Thanks. I didn't doubt it, I'd just like the claim to be accompanied with a demonstration. In my opinion one doesn't ask "why" before establishing that it's in fact true. – Kelly Bundy Aug 24 '21 at 17:21

1 Answers1

4

Ripping through a single string at the C layer is cheap. Performing rich comparisons with many many objects (each of which has to go through its own chain of function calls to perform the equality check) has higher overhead. It's not a matter of big-O at all here (if i is of a length not found in any element of the list, then the work is O(n) for the list check, just as an optimized string search would be, as all the individual string comparisons can immediately return false based on the length check without reading their contents), it's just that the in list check has to look up and invoke __eq__ 40,000 times, and in str doesn't.

Of course, the behavior is also different; "foo" in "seafood is great" will return True, while "foo" in ["seafood", "is", "great"] will return False. In real code, if this test had to be performed many times, and you needed to match whole words, you'd usually just construct a set up front (O(n)) and reduce the per-check O(n) problem to a per-check (roughly) O(1) problem.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271