How can bisect_right() be 4x slower than insort_right()?

Question

I'm trying to optimize a solution that times out on a competitive programming website. I started using cProfile and it appears to show bisect_right() requiring 4x as much time as insort_right(). This is a profiling run on input list with over 40k entries:

         89936 function calls in 3.596 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.596    3.596 <string>:1(<module>)
        1    0.180    0.180    3.596    3.596 leetcode.py:38(go)
        1    3.357    3.357    3.415    3.415 leetcode.py:6(numSubarrayProductLessThanK)
    44965    0.045    0.000    0.045    0.000 {built-in method _bisect.bisect_right}
    44965    0.014    0.000    0.014    0.000 {built-in method _bisect.insort_right}
        1    0.000    0.000    3.596    3.596 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I thought all list insertions were O(n) since on average n/2 elements must be moved. And simply determining the location of insertion into sorted list would be O(log n). But in the profile report it looks flipped: the inserter insort_right() is slower than the location determiner bisect_right(). Where am I going wrong? Here is the code:

from bisect import insort, bisect

class Solution:
    def numSubarrayProductLessThanK(self, nums, k):
        if k == 0:
            return 0

        result = 0
        product = 1 
        products = [1]
        products_n = 1

        for num in nums:
            product *= num

            factor_min = product // k
            n_factors_less = products_n - bisect(products, factor_min)

            result += n_factors_less

            insort(products, product)
            products_n += 1

        return result

Thank you for looking.

https://stackoverflow.com/questions/1110332/performance-of-list-insert is very closely related (asked by someone else also curious about bisect performance), though not duplicative. — Charles Duffy, Dec 19 '19 at 22:18
...that said, inasfar as your goal is to figure out real-world performance, I could see us getting into the space where the details of your hardware are relevant -- big-O describes how the number of operations changes with the size of the data structure, sure, but it doesn't say anything about the constant factors around how long those operations *actually take* in wall-clock time. — Charles Duffy, Dec 19 '19 at 22:22
Your posted code is calling `bisect` and `insort` with completely different arguments. It looks like the `insort` calls may be more amenable to branch prediction, and/or they may have better cache behavior. (It looks like `insort` is always called with a `product` that will end up at the end of the `products` list.) — user2357112, Dec 19 '19 at 22:28

score 2 · Accepted Answer · answered Dec 19 '19 at 22:47

Your bisect and insort calls pass completely different arguments.

Assuming nums is a list of positive integers, your insort calls will always insert the new product at the end of the products list, which takes O(1) rather than O(n) time for insertion, and which is highly amenable to branch prediction for the binary search. (Python object comparisons are more complex than a simple hardware comparison instruction, but branch prediction still applies to the logic within the comparison hooks, particularly since int comparisons are implemented in C.)

Meanwhile, assuming k is substantially greater than typical elements of nums, your bisect calls will find a spot for factor_min somewhere in the middle of the list, likely near but not at the end most of the time. This is much less amenable to branch prediction.

Without a more sophisticated test, I cannot be sure that branch prediction is the dominating factor, but it seems likely.

I might be blind, but I don't see the completely different arguments. Both `bisect()` and `insort()` calls are passed the `products` list. That's a good observation about product always growing, due to the `nums` members being positive integers, but `insort()` doesn't know that, it must still do log2 steps to determine that the end is the correct place, right? Or does it have a quick begin/end check before going into binary search mode? That might explain things! Insertion is instant because I'm constantly adding to the end, and search takes time because searched elem is midway. — Andrew Lamoureux, Dec 20 '19 at 00:12
On second consideration, I see that you mean that what's being inserted and what's being searched are completely different. Thanks for your answer, I accepted it. — Andrew Lamoureux, Dec 20 '19 at 01:16

How can bisect_right() be 4x slower than insort_right()?

1 Answers1