3

I was implementing and testing answers to this SO question -

Given an array of integers find the number of all ordered pairs of elements in the array whose sum lies in a given range [a,b]

The answer with the most upvotes (currently) only provides a text description of an algorithm that should be O(NlogN):

Sort the array... . For each element x in the array: Consider the array slice after the element. Do a binary search on this array slice for [a - x], call it y0. If no exact match is found, consider the closest match bigger than [a - x] as y0. Output all elements (x, y) from y0 forwards as long as x + y <= b. ... If you only need to count the number of pairs, you can do it in O(nlogn). Modify the above algorithm so [b - x] (or the next smaller element) is also searched for.

My implementation:

import bisect
def ani(arr, a, b):
    # Sort the array (say in increasing order).
    arr.sort()
    count = 0
    for ndx, x in enumerate(arr):
        # Consider the array slice after the element
        after = arr[ndx+1:]
        # Do a binary search on this array slice for [a - x], call it y0
        lower = a - x
        y0 = bisect.bisect_left(after, lower)
        # If you only need to count the number of pairs
        # Modify the ... algorithm so [b - x] ... is also searched for
        upper = b - x
        y1 = bisect.bisect_right(after, upper)
        count += y1 - y0
    return count

When I plot Time versus N or some function of N I am seeing an exponential or N^2 response.

# generate timings
T = list()    # run-times
N = range(100, 10001, 100)    # N
arr = [random.randint(-10, 10) for _ in xrange(1000000)]
print 'start'
start = time.time()
for n in N:
    arr1 = arr[:n]
    t = Timer('ani(arr1, 5, 16)', 'from __main__ import arr1, ani')
    timing_loops = 100
    T.append(t.timeit(timing_loops) / timing_loops)

Is my implementation incorrect or is the author's claim incorrect?

Here are some plots of the data.

T vs N T vs N T / NlogN vs N - one commenter thought this should NOT produce a linear plot - but it does. T / NlogN vs N T vs NlogN - I thought this should be linear if the complexity is NlogN but it is not. T vs NlogN

Community
  • 1
  • 1
wwii
  • 23,232
  • 7
  • 37
  • 77
  • 1
    Why are you slicing at all? `bisect` functions take lower and upper bounds, so you can avoid creating a new list altogether (which cumulative takes 2N operations, times N steps, making your code N^2, albeit in fast C code for the inner loop). – Martijn Pieters Jan 18 '15 at 02:12
  • @MartijnPieters unfortunately I was trying to keep it simple for clarity then couldn't see the consequence of that decision. – wwii Jan 18 '15 at 02:34
  • And a more compact implementation using `sum()` to illustrate: https://gist.github.com/mjpieters/88c039207be8353e8620 – Martijn Pieters Jan 18 '15 at 02:58

1 Answers1

6

If nothing else, this is your error:

for ndx, x in enumerate(arr):
    # Consider the array slice after the element
    after = arr[ndx+1:]

arr[ndx+1:] creates a copy of the list of length len(arr) - ndx, so therefore your loop is O(n^2).

Instead, use the lo and hi arguments to bisect.bisect.

Dolda2000
  • 25,216
  • 4
  • 51
  • 92
  • I'm not sure I follow the recommendation. Are you saying to just search the entire array not a slice of of it? Also do you think I implemented the author's intent? – wwii Jan 18 '15 at 02:14
  • 1
    No, please see the linked documentation. The `lo` and `hi` arguments are precisely for searching through only a slice of the list, but without needing to create that slice explicitly. – Dolda2000 Jan 18 '15 at 02:15
  • And yes, it does seem that you implemented the author's intent. – Dolda2000 Jan 18 '15 at 02:17
  • @wwii: no, instead of creating a new list `after`, use `ndx + 1` as a lower bound for the `bisect.bisect_left()` function; the third argument. Replace `bisect.bisect_left(after, lower)` with `bisect.bisect_left(arr, lower, ndx + 1)` to start the bisection at index `ndx + 1`. – Martijn Pieters Jan 18 '15 at 02:18
  • Very good thank you. I couldn't see the forest for the trees. – wwii Jan 18 '15 at 02:31