2

As far as I understand, when I do 'foo' in 'abcfoo' in Python, the interpreter tries to invoke 'abcfoo'.__contains__('foo') under the hood.

This is a string matching (aka searching) operation that accepts multiple algorithms, e.g.:

enter image description here

How do I know which algorithm a given implementation may be using? (e.g. Python 3.8 with CPython). I'm unable to this information looking at e.g. the source code for CPython for string. I'm not familiar with its code base, and e.g. I can't find the __contains__ defined for it.

pygeek
  • 7,356
  • 1
  • 20
  • 41
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
  • Implementation details probably shouldn't matter for your program. If they do, then you may have a design problem and might want to write an extension or implement the algorithm yourself, possibly in a different language or using a library. If you're just curious, then see [this thread](https://stackoverflow.com/questions/8608587/finding-the-source-code-for-built-in-python-functions). – ggorlen Apr 12 '20 at 19:30
  • 1
    Does this answer your question? [Finding the source code for built-in Python functions?](https://stackoverflow.com/questions/8608587/finding-the-source-code-for-built-in-python-functions) – ggorlen Apr 12 '20 at 19:30
  • 1
    You're looking in the wrong file - see [here](https://github.com/python/cpython/blob/v3.8.2/Objects/unicodeobject.c#L11215-L11273). – wim Apr 12 '20 at 19:38
  • @ggorlen I appreciate your comment, but respectfully disagree. The runtime of a method can be an important consideration. I don't think this is an implementation detail that a user of a method _should_ ignore. – Amelio Vazquez-Reina Apr 12 '20 at 19:39
  • 1
    @AmelioVazquez-Reina: Note that if your strings are huge then it may make sense to investigate, otherwise if you care about raw processing speed then Python is probably not the best language to consider. Who chooses Python doesn't do that because of its speed. – 6502 Apr 12 '20 at 19:47
  • Oh, I'm definitely all for knowing the time complexity that the contract for the interface might guarantee (there are no guarantees of runtime, that's machine-specific), but beyond that, if your program makes so many assumptions about implementation details that could change from version to version, that's a strong indicator that something has gone wrong with the design. If you have a performance problem and you're resorting to comparing how some particular version of CPython has implemented Boyer-Moore instead of Rabin-Karp (or whatever), something has probably gone wrong. – ggorlen Apr 12 '20 at 19:47
  • Thanks @ggorlen ok but note that just becase a method can guarantee a certain type of interface (e.g. a function signature), it does **not** mean that it **cannot** provide other types of guarantees. Libraries **can** be designed and built for specific runtime / scale usage considerations and declare a runtime contract they are meant to fulfill. – Amelio Vazquez-Reina Apr 12 '20 at 19:52
  • Yes, an interface will usually guarantee a time complexity which is what you'll want to be concerned with, but how an interface chooses to meet that time complexity is an implementation detail and subject to change. For example, [CPython dictionaries](https://stackoverflow.com/questions/327311/how-are-pythons-built-in-dictionaries-implemented) underwent a major redesign in CPython 3.6. It's dangerous to make assumptions based on implementation details. – ggorlen Apr 12 '20 at 20:06

1 Answers1

3

According to the source code:

/* fast search/count implementation, based on a mix between boyer-
   moore and horspool, with a few more bells and whistles on the top.
   for some more background, see: http://effbot.org/zone/stringlib.htm */

That link suggests that it's used in find, index, split, replace, __contains__, although it may be outdated by now.

  • +1 Excellent. Thanks Iván! (for reference for others for more info on the above see e.g. [Boyer-Moore](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm) and [Boyer–Moore–Horspool](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm)). – Amelio Vazquez-Reina Apr 12 '20 at 19:45
  • 1
    @ggorlen Changed it –  Apr 12 '20 at 20:37