15

I am trying to find the implementation of the built-in in operator in the (C) Python source code. I have searched in the built-in functions source code, bltinmodule.c, but cannot find the implementation of this operator. Where can I find this implementation?

My goal is to improve the sub-string search in Python by extending different C implementations of this search, although I am not sure if Python already uses the idea I have.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Michael
  • 2,827
  • 4
  • 30
  • 47

2 Answers2

42

To find the implementation of any python operator, first find out what bytecode Python generates for it, using the dis.dis function:

>>> dis.dis("'0' in ()")
  1           0 LOAD_CONST               0 ('0')
              2 LOAD_CONST               1 (())
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE

The in operator becomes a COMPARE_OP byte code. Now you can trace how this opcode is being handled in the Python evaluation loop in Python/ceval.c:

TARGET(COMPARE_OP)
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();

cmp_outcome() is defined in the same file, and the in operator is one of the switches:

case PyCmp_IN:
    res = PySequence_Contains(w, v);
    if (res < 0)
         return NULL;
    break;

A quick grep shows us where PySequence_Contains is defined, in Objects/abstract.c:

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

PySequence_Contains thus uses the sq_contains slot on the Sequence object structure or an iterative search otherwise, for Python C objects.

For Python 3 Unicode string objects, this slot is implemented as PyUnicode_Contains in Objects/unicodeobject.c, in Python 2 you also want to check out string_contains in Objects/stringobject.c. Basically just grep for sq_contains in the Objects/ subdirectory for the various implementations by the different Python types.

For generic python objects, it's interesting to note that Objects/typeobject.c defers this to the __contains__ method on custom classes, if so defined.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    @Martijn Pieters: i went through the code of string contains for python 2.7, it uses PyUnicode_Contains that uses [stringlib_contains_obj] (http://hg.python.org/cpython/file/2370e331241b/Objects/unicodeobject.c#l6223), that uses [stringlib_find] (http://hg.python.org/cpython/file/2370e331241b/Objects/stringlib/find.h#l85) that uses [fastsearch] (http://hg.python.org/cpython/file/2370e331241b/Objects/stringlib/fastsearch.h#l37) that compares only the first letter of the substring, something is wrong their or i am missing something (and it doesn't use Boyer–Moore as i thought)? – Michael Sep 04 '12 at 07:02
  • 1
    @Michael: It doesn't use Boyer-Moore, it uses an algorithm *inspired* by B-M, see http://effbot.org/zone/stringlib.htm. It's simplified B-M with some Horspool and Sunday thrown in. – Martijn Pieters Sep 04 '12 at 07:15
0

In python 3.9, the bytecode COMPARE_OP was split into four distinct instructions, and CONTAINS_OP for 'in' and 'not in' tests.

That did not change the implementation of in operator, it's directly handled in Python/ceval.c, calling PySequence_Contains().

ofey404
  • 31
  • 5