1

The quest started from a simple LeetCode problem. I am learning python and trying to solve a problem in leetcode where I have used len() while checking the condition in while loop. I got curious if I write len(nums) in while will my program do more computations. To find this out I started looking for the source code.

while i < len(nums):
       #if both the numbers are same we can pop the ith number
       #else just increase the index and return the length in the end.
       if nums[i] ==  val:
            nums.pop(i)
       else:
           i+=1
return len(nums)

Now, I have 2 question:

  1. How to look for the source code of builtin functions without manually searching the source code on GitHub?
  2. How len function works internally?

I have 2 assumptions for it:

  1. Python treat every thing as an object and they have a property called length(or something like that) and whenever I pop an element from list. This property get decremented by 1.
  2. Another assumption is python in someway iterates over the whole object and return the length.

I got the source code. However, again it's using another function to calculate the length.

static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
    Py_ssize_t res;

    res = PyObject_Size(obj);
    if (res < 0) {
        assert(PyErr_Occurred());
        return NULL;
    }
    return PyLong_FromSsize_t(res);
}

@Antti Haapala did a great job in explaining the answer. However, It doesn't answer my question.

These are some of the relevant question that I found on the stack overflow:

How to view source code of function in python?

explanation of C implementation python's len function

Tarster
  • 105
  • 1
  • 1
  • 9
  • 1
    "manually searching the source code on GitHub" – well, I mean, use Github's search to find e.g. `PyObject_Size`'s definition? https://github.com/python/cpython/search?q=pyobject_size – AKX Jul 30 '21 at 11:24
  • https://stackoverflow.com/questions/8608587/finding-the-source-code-for-built-in-python-functions#:~:text=8%20Answers&text=Since%20Python%20is%20open%20source,in%20the%20documentation%20of%20inspect%20. – Eumel Jul 30 '21 at 11:29
  • The complexity of getting the length is O(1) (https://wiki.python.org/moin/TimeComplexity) so the source code is not really important for this matter, it's an efficient operation and that's pretty much it. Of course, it's one more instruction, but I highly doubt one instruction will make you fail the leetcode problem. – Shinra tensei Jul 30 '21 at 11:48

1 Answers1

5

Question 0

I got curious if I write len(nums) in while will my program do more computations.

One aspect of this is documented in the Python wiki's TimeComplexity page for all built-in data structures. len() for a list is O(1).

If you mean something along the lines of

will my program be faster if I do n = len(nums), then manually subtract 1 from n each time I remove from the list

then that's a whole other question, and the answer is likely (measure it!) to be (perhaps somewhat unintuitively) "no", since len() is implemented in C (fast!) and interpreting Python code (n -= 1) and executing it is slower.

Question 1

How to look for the source code of builtin functions without manually searching the source code on GitHub?

As prerequisites, you will need to

  • know how to read C and understand the control flow
  • be able to keep track of the call graph (in your head, in a text file, on a notepad)
  • have an intuition of where you start looking in the source code

GitHub's source code search is, well, passable, but you'll have a better time downloading the source and using a better IDE to jump around in the code.

For built-in functions in modules, I'd start searching for e.g. mathmodule.c for the math functions, etc.

For implementations of objects, there's e.g. listobject.c. It's fairly logical (most of the time).

Question 2

  1. You already found builtin_len.
  2. You can see it calls PyObject_Size. That's defined here.
  3. It does PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence;, i.e. grabs a pointer to the type header of the object, and the "slot" (not to be confused with the Python userland slots) of the sequence-related methods for the object.
  4. If that method collection contains a valid sq_length() function, it is called: Py_ssize_t len = m->sq_length(o); If that length is valid, it is returned, and len() wraps the bare size_t into a Python long object and passes it to you.
  5. If that fails, PyMapping_Size gets called.
  6. It does a similar thing as the sq_length stuff, only using mapping methods, tp_as_mapping and mp_length.
  7. If all that fails, a TypeError is raised using the type_error() helper.

Here in listobject.c, you can see how list_length() is hooked up to be sq_length for list objects.

That function only calls Py_SIZE()[https://docs.python.org/3/c-api/structures.html#c.Py_SIZE], which is a macro to access the ob_size field which all PyVarObjects have.

The documentation on find how Python's list objects use ob_size is here.

As for how a custom type with __len__ hooks up into all of this, my recollection is that objects with __len__ will have their sq_length() call the Python callable, if one exists, and that value is then "trampolined" back through the C code back to your Python code.

AKX
  • 152,115
  • 15
  • 115
  • 172
  • Thanks, It's a great answer. As you have mentioned I had implemented it using the manual method as well in my submission. However, not shared it here. LeetCode displayed execution time is not reliable and I don't know how to test the speed of a program. So, as of now I can't test it. I am happy to know in short that len takes O(1) rather than O(n). Follow up question: As the time displayed on the website is for list or in general for any type of built in data type. – Tarster Jul 30 '21 at 12:24
  • 2
    "I don't know how to test the speed of a program." The simplest is the [built-in `timeit` module](https://docs.python.org/3.9/library/timeit.html#module-timeit). – AKX Jul 30 '21 at 12:39
  • "As the time displayed on the website is for list or in general for any type of built in data type." The page has subheaders for lists, deques, dicts, and sets. – AKX Jul 30 '21 at 12:39
  • Lol, I totally get there are subheaders, However length is not mentioned in other data types and what about tuples. – Tarster Jul 30 '21 at 12:44
  • 2
    Well, my post did give you the tools to figure that out yourself, but: tuple: `Py_SIZE()`, constant. dict: `((PyDictObject *)mp)->ma_used`, constant. set: `(((PySetObject *)(so))->used`, constant. deques: `Py_SIZE()`, constant. – AKX Jul 30 '21 at 12:51