Why is this lambda involving a list.index() call so slow?

Question

Using cProfile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   17.834   17.834 <string>:1(<module>)
        1    0.007    0.007   17.834   17.834 basher.py:5551(_refresh)
        1    0.000    0.000   10.522   10.522 basher.py:1826(RefreshUI)
        4    0.024    0.006   10.517    2.629 basher.py:961(PopulateItems)
      211    1.494    0.007    7.488    0.035 basher.py:1849(PopulateItem)
      231    0.074    0.000    6.734    0.029 {method 'sort' of 'list' objects}
      215    0.002    0.000    6.688    0.031 bosh.py:4764(getOrdered)
     1910    3.039    0.002    6.648    0.003 bosh.py:4770(<lambda>)
      253    0.178    0.001    5.600    0.022 bosh.py:3325(getStatus)
        1    0.000    0.000    5.508    5.508 bosh.py:4327(refresh)
     1911    3.051    0.002    3.330    0.002 {method 'index' of 'list' objects}

The 1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>) line puzzles me. At bosh.py:4770 I have modNames.sort(key=lambda a: (a in data) and data.index(a)), data and modNames being lists. Notice 1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects} which seems to come from this line.

So why is this so slow ? Any way I can rewrite this sort() so it performs faster ?

EDIT: a final ingredient I was missing to grok this lambda:

>>> True and 3
3

The function body is `a in data and data.index(a)` and the two operations in it are O(n) where n is the list size of data. Without answering your entire question, I would place the blame on this lambda function. Each time it is called, the lambda will scan through `data`. If you use a dict for `data` instead of a list, it'll probably be faster since dict has an average lookup time complexity of O(1). — YardGlassOfCode, Oct 30 '14 at 02:48

unutbu · Accepted Answer · 2014-10-30T11:17:55.733

4

As YardGlassOfCode stated, it's not the lambda per se which is slow, it is the O(n) operation inside the lambda which is slow. Both a in data and data.index(a) are O(n) operations, where n is the length of data. And as an additional affront to efficiency, the call to index repeats much of the work done in a in data too. If the items in data are hashable, then you can speed this up considerably by first preparing a dict:

weight = dict(zip(data, range(len(data))))
modNames.sort(key=weight.get)  # Python2, or
modNames.sort(key=lambda a: weight.get(a, -1))  # works in Python3

This is much quicker because each dict lookup is a O(1) operation.

Note that modNames.sort(key=weight.get) relies on None comparing as less than integers:

In [39]: None < 0
Out[39]: True

In Python3, None < 0 raises an TypeError. So lambda a: weight.get(a, -1) is used to return -1 when a is not in weight.

edited Oct 30 '14 at 11:17

answered Oct 30 '14 at 02:50

unutbu

842,883
184
1,785
1,677

Thanks - will test it and post back. So I guess also that the original worked because `False < 0` also (sort(key=...) could use some docs) – Mr_and_Mrs_D Oct 30 '14 at 10:27
1

[Since `False == 0`](http://stackoverflow.com/q/2764017/190597), `False < 0` is False, and `0 < False` is False. That means in the original, the 0 indexed item could get sandwiched in between items not in `data`. Consider, for example, `sorted([False, 0, False])`. – unutbu Oct 30 '14 at 11:25
NB: I used `key=lambda a: dataDict.get(a, sys.maxint)` as I wanted items that are absent to be ordered to the bottom python 2.7.8 – Mr_and_Mrs_D Oct 30 '14 at 16:11

Why is this lambda involving a list.index() call so slow?

1 Answers1

Linked

Related