I want to implement the following procedure in the fastest way using Python3: given a list of N
random integers I need to return the K
smallest ones (and I do not need the returned integers to be sorted).
I implemented it in three different ways (as you can see in the code below).
the
test_sorted()
function use the built-insorted()
function to order the whole list of integers and then takes a slice of the firstK
elements. The cost of this operation should be essentially the cost of running thesorted()
function, which has a time complexity ofO(N log(N))
.the
test_heap()
function use a heap to store only the lowestK
elements and returns them. Inserting an element on a heap has a time complexity ofO(log(N))
and in theory the number of time we need to push an item in the heap isN
. However, after the firstK
insertions we will be pushing and popping from the heap and I would expect that if the incoming element is bigger than any element in the heap no insertion would occur and time complexity should be somewhere betweenO(K log(N))
andO(N log(N))
(depending on the actual ordering of the input list). Anyway, even if my assumption is not true, the worst complexity should beO(N log(N))
(as usual, I consider negligible the cost of all the comparisons we need).the
test_nsmallest()
function use thensmallest()
function from theheapq
module. I had no expectation about this approach and since in the official python documentation I only found thatFor larger values, it is more efficient to use the sorted() function. I decided to give it a try.
# test.py
from heapq import heappush, heappushpop, nsmallest
from random import randint
from timeit import timeit
N, K = 1000, 50
RANDOM_INTS = [randint(1,100) for _ in range(N)]
def test_sorted():
return sorted(RANDOM_INTS)[:K]
def test_heap():
heap = []
for val in RANDOM_INTS:
if len(heap) < K:
heappush(heap, -val)
else:
heappushpop(heap, -val)
return [-val for val in heap]
def test_nsmallest():
return nsmallest(K, RANDOM_INTS)
def main():
sorted_result = timeit("test_sorted()", globals=globals(), number=100_000)
print(f"test_sorted took: {sorted_result}")
heap_result = timeit("test_heap()", globals=globals(), number=100_000)
print(f"test_heap took: {heap_result}")
nsmallest_result = timeit("test_nsmallest()", globals=globals(), number=100_000)
print(f"test_nsmallest took: {nsmallest_result}")
r1, r2, r3 = test_sorted(), test_heap(), test_nsmallest()
assert len(r1) == len(r2) == len(r3)
assert set(r1) == set(r2) == set(r3)
if __name__ == '__main__':
main()
The output on my (old) late 2011 MacBook Pro with a 2.4GHz i7 processor is as follows.
$ python --version
Python 3.9.2
$ python test.py
test_sorted took: 8.389572635999999
test_heap took: 18.586762750000002
test_nsmallest took: 13.772040639000004
The simplest solution using sorted()
its by far the best, can anyone elaborate on why the outcome does not match my expectation (i.e., that the test_heap()
function should be at least a bit faster)? What am I missing?
If I run the same code with pypy the result is the opposite.
$ pypy --version
Python 3.7.10 (51efa818fd9b, Apr 04 2021, 12:03:51)
[PyPy 7.3.4 with GCC Apple LLVM 12.0.0 (clang-1200.0.32.29)]
$ pypy test.py
test_sorted took: 7.1336525249998886
test_heap took: 3.1177806880004937
test_nsmallest took: 7.5453417899998385
And this is something closer to my expectations.
Provided that I know nothing about the python internals and I only have a very rough understanding of why pypy is faster than python, can anyone elaborate on those results and add some information about what is going on in order to allow me to correctly foresee the best choice for similar situations in the future?
Also, if you have any suggestion about other implementations that run faster than the above ones, please feel free to share!
UPDATE:
What if we need to sort the input list according to some criterion that is not the value of the item itselves (as I actually need in my real use case; the above is just a simplification)? Well, in this case the outcome is even more surprising:
# test2.py
from heapq import heappush, heappushpop, nsmallest
from random import randint
from timeit import timeit
N, K = 1000, 50
RANDOM_INTS = [randint(1,100) for _ in range(N)]
def test_sorted():
return sorted(RANDOM_INTS, key=lambda x: x)[:K]
def test_heap():
heap = []
for val in RANDOM_INTS:
if len(heap) < K:
heappush(heap, (-val, val))
else:
heappushpop(heap, (-val, val))
return [val for _, val in heap]
def test_nsmallest():
return nsmallest(K, RANDOM_INTS, key=lambda x: x)
def main():
sorted_result = timeit("test_sorted()", globals=globals(), number=100_000)
print(f"test_sorted took: {sorted_result}")
heap_result = timeit("test_heap()", globals=globals(), number=100_000)
print(f"test_heap took: {heap_result}")
nsmallest_result = timeit("test_nsmallest()", globals=globals(), number=100_000)
print(f"test_nsmallest took: {nsmallest_result}")
r1, r2, r3 = test_sorted(), test_heap(), test_nsmallest()
assert len(r1) == len(r2) == len(r3)
assert set(r1) == set(r2) == set(r3)
if __name__ == '__main__':
main()
Which outputs:
$ python test2.py
test_sorted took: 18.740868524
test_heap took: 27.694126547999996
test_nsmallest took: 25.414596833000004
$ pypy test2.py
test_sorted took: 65.88409741500072
test_heap took: 3.9442632220016094
test_nsmallest took: 19.981832798999676
This tells me at least two things:
using an external key for sorting is extremely expensive, both when you provide a lambda function using the
key
kwarg and when you need to build a tuple(sorting_value, actual_value)
to obtain the desired ordering in the heap.Using lambdas with pypy seems to be extremely expensive, but I don't know why... maybe pypy can not optimize them and this doesn't play along with the other optimisations it performs???