2

In python, let's say we have the following list: my_list = ['a', 'b', 'c', 'd', ...]

Would my_list.pop(3) be more efficient than my_list.remove('d')?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Ben Sandler
  • 2,223
  • 5
  • 26
  • 36
  • 2
    Have you tried timing it yourself with `timeit`? – Morgan Thrapp Aug 18 '15 at 16:43
  • 4
    Just because something times better under some arbitrary input doesn't mean that it is more efficient. I am looking for a more theoretical answer. – Ben Sandler Aug 18 '15 at 16:50
  • 1
    They're not equivalent - do you already know the index, or the item? I would expect the effort of moving everything else in the array around to dominate over any slight differences. – jonrsharpe Aug 18 '15 at 17:01

3 Answers3

10

According to this answer, the complexity of these two methods is:

pop     O(n - i)
remove  O(n)

with the list length n and index of the element i.

So, depending on the size of your list, pop might be faster on the long run.

Community
  • 1
  • 1
Falko
  • 17,076
  • 13
  • 60
  • 105
  • 2
    In practical terms, though, I doubt you're going to be beating the constant term with any day-to-day value of `n`. – jonrsharpe Aug 18 '15 at 17:14
6

It doesn't matter much for small lists:

[wander@box ~]$ python -m timeit '[1, 2, 3].pop(1)'
10000000 loops, best of 3: 0.167 usec per loop
[wander@box ~]$ python -m timeit '[1, 2, 3].remove(2)'
10000000 loops, best of 3: 0.129 usec per loop

If your lists are really large, or if it takes a long time to compare elements in your lists, there might be more of a difference. Remove will be slower, as it has to go through (and compare) all the elements:

[wander@box ~]$ python -m timeit -n 100000 'list(range(1, 100)).pop(98)'
100000 loops, best of 3: 0.916 usec per loop
[wander@box ~]$ python -m timeit -n 100000 'list(range(1, 100)).remove(98)'
100000 loops, best of 3: 2.05 usec per loop

And that's with integers, which are quite fast to compare. If the elements in your lists have more interesting __eq__ methods, remove can take a long time:

class Foo:
    def __eq__(self, other):
        time.sleep(1)
        return False

[Foo(), Foo(), Foo(), Foo(), 20].remove(20)

So, if you know the index, go with pop.

Wander Nauta
  • 18,832
  • 1
  • 45
  • 62
4

Looking at the actual C code (you meant for CPython, right?) for listpop and listremove, you can see that:

  1. .remove needs to iterate over the list (and therefore scales by O(i) where i is the index of the item);

  2. .pop takes shortcuts, as:

    • it's trivial to figure out whether the index is out of bounds (line 928), but .remove has to check through the whole list to find its target (or fail to do so); and

    • it special-cases the index being the last item in the list (line 933);

  3. listremove has a call to PyObject_RichCompareBool (line 2200), which is relatively expensive, because it needs to check whether the object at the current index is equal to the item; and

  4. both (except in the special case for .pop mentioned above) ultimately delegate to list_ass_slice (lines 941 and 2202 - and no giggling at the back) once the appropriate slicing location is found; this has to shuffle the items in the rest of the array along, so will be O(n - i).

On that basis .pop will be faster, especially for the last item in the list; however, if you've started with the item itself and have already had a O(n) operation with a rich comparison to find its .index, what you gain on the swings you already lost on the roundabout.

Also, the operation of rearranging everything that's left in the array (i.e. list_ass_slice) to make up for what you've removed, which both .pop and .remove need to do, is likely to be far more expensive than figuring out which item to remove in the first place.

Note that, without diving into the source code, pretty much all of the above could be figured out just by thinking logically about what each operation involves.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437