11

Given an array of integers find the number of all ordered pairs of elements in the array whose sum lies in a given range [a,b]

Here is an O(n^2) solution for the same

'''
counts all pairs in array such that the 
sum of pair lies in the range a and b
'''
def countpairs(array, a, b):
    num_of_pairs = 0
    for i in range(len(array)):
        for j in range(i+1,len(array)):
            total = array[i] + array[j]
            if total >= a and total <= b:
                num_of_pairs += 1
    return num_of_pairs

I know my solution is not optimal What is a better algorithm for doing this.

akashchandrakar
  • 2,009
  • 3
  • 23
  • 47

9 Answers9

14
  1. Sort the array (say in increasing order).
  2. For each element x in the array:
    • Consider the array slice after the element.
    • Do a binary search on this array slice for [a - x], call it y0. If no exact match is found, consider the closest match bigger than [a - x] as y0.
    • Output all elements (x, y) from y0 forwards as long as x + y <= b

The time complexity is of course output-sensitive, but this is still superior to the existing algo:

O(nlogn) + O(k)

where k is the number of pairs that satisfy the condition.

Note: If you only need to count the number of pairs, you can do it in O(nlogn). Modify the above algorithm so [b - x] (or the next smaller element) is also searched for. This way, you can count the number of 'matches' each element has in O(logn) simply from the indices of the first and last match. Then it's just a question of summing those up to get the final count. This way, the initial O(nlogn) sorting step is dominant.

Ani
  • 111,048
  • 26
  • 262
  • 307
  • i was writing the exact same solution :) – Ali Akber Dec 30 '14 at 12:39
  • 3
    This is not optimal. The last comment in this solution is misleading: even if the list is already sorted, it's still an nlogn algorithm. You have to do two binary searches for every index, each binary search costs logn and there are n indices. I give a solution that is sort + O(N). While of course both solutions are nlogn, mine is strictly faster for N sufficiently large. – Nir Friedman Jan 07 '15 at 05:12
  • 1
    @wwii I'm asking this respectfully but sincerely: what's your point? – Nir Friedman Jan 12 '15 at 15:53
  • @NirFriedman I just wanted to confirm the time complexity then, I guess naively, I just made a comment of the results. Initially I thought the actual time values (T), not logT, should have correlated with logN. – wwii Jan 13 '15 at 04:50
  • @wwii plotting log-log graphs is dangerous, and care needs to be taken. First of all, log-log plotting is more suitable when you have a power relationship, e.g. t = aN^2. Then logt = loga + 2logN. However, even here, fitting a straight line can give you systematically incorrect results because the log function does not symmetrically distribute error. Second, in this particular case, you are plotting t = NlogN log-log, i.e. logt = logN + loglogN. LoglogN grows incredibly slowly, which is why your line looks 'almost' straight. – Nir Friedman Jan 13 '15 at 14:53
  • I'm more than happy to send you more literature on this topic (power law relationships were a major topic in the field in which I did my PhD, so log-log plots and common mistakes using them was something I grew to be very aware of). – Nir Friedman Jan 13 '15 at 14:54
  • @NirFriedman Hmmm ... how would you go about verifying the time complexity of an algorithm. – wwii Jan 14 '15 at 01:51
  • Unfortunately, you can't really verify it per se, it only applies for sufficiently large N, and you don't know how big N needs to be. It's more of a mathematical statement than a prediction, really. By verification, I assume you are ok with some kind of reasonable visual verification? I would advise plotting T/ NlogN on the y axis, versus N on the x axis. The curve should asymptote to a horizontal line as you move right, if you can make N large enough (and NlogN is the complexity, which it is). – Nir Friedman Jan 14 '15 at 15:45
  • @NirFriedman - T/ NlogN vs N looks like a linear response. So I guess the point of my initial comment (which somehow got deleted) is that I am seeing more like an exponential complexity with this algorithm. – wwii Jan 15 '15 at 05:37
  • If it really looks linear, then that means that T / NlogN = kN, which implies that T is N^2 logN, not exponential. This algorithm is definitely NlogN though, so it's either a) some mistaking you are making with your methodology, or b) N is just not large enough. I would probably start by doing just plotting the times for the sort in the way I described and see if it works out, since the built in sort of python is nlogn. – Nir Friedman Jan 15 '15 at 17:28
11

Sort the array first and count the pairs by two indexes. The two indexes approach is similar to the one in 2-sum problem, which avoids the binary-search for N times. The time consuming of the algorithm is Sort Complexity + O(N), typically, sort is O(NlnN), thus this approach is O(NlnN). The idea of the algorithm is, for an index i, find an lower bound and an upper bound such that a <= arr[i]+arr[low] <= arr[i]+arr[high] <= b and when i increases, what we should do is to decrease low and high to hold the condition. To avoid counting the same pair twice, we keep low > i, also we keep low <= high. The complexity of the following counting approach is O(N), because, in the while loop, what we can do is ++i or --low or --high and there are at most N such operations.

//count pair whose sum is in [a, b]
//arr is a sorted array with size integers.
int countPair(int arr[], int size, int a, int b) {
    int cnt = 0;
    int i = 0, low = size-1, high = size-1;
    while (i < high) {
        //find the lower bound such that arr[i] + arr[low] < a, 
        //meanwhile arr[i]+arr[low+1] >= a
         low = max(i, low);
         while (low > i && arr[i] + arr[low] >= a) --low;

        //find an upper bound such that arr[i] + arr[high] <= b 
        //meanwhile, arr[i]+arr[high+1] > b
        while (high > low && arr[i] + arr[high] > b) --high; 
        //all pairs: arr[i]+arr[low+1], arr[i]+arr[low+2],...,arr[i]+arr[high]
        //are in the rage[a, b], and we count it as follows.
        cnt += (high-low);
        ++i;
    }
    return cnt;
}
Community
  • 1
  • 1
notbad
  • 2,797
  • 3
  • 23
  • 34
  • 1
    This looks N^2 to me. N for loop iterations, each one of which can have up to N iterations between the two while loops. Consider the case where all pairs work. – Nir Friedman Jan 07 '15 at 04:38
  • 2
    @Nir Friedman: The initial values of `st` does not start at zero or `i` in each iteration of the outer loop, it increases in every iteration. If `end` is found (by , e.g., binary search) for the first iteration, and increased in the inner loop, things might look better … – greybeard Jan 07 '15 at 09:36
  • 1
    @NirFriedman It is not N^2, but my last implementation is a little misleading. I have reimplemented it, it looks more like a linear algorithm:) – notbad Jan 07 '15 at 16:18
  • 1
    @NirFriedman Consider the case where all pairs work, st will increase by 1 each loop but `end` will be always `size-1`. So it is linear:) – notbad Jan 07 '15 at 16:23
  • 1
    @zhiwenf yes, I realized that after I wrote. I believe the previous implementation was correct now, however it was extremely confusingly written. The new implementation is great. The solution I gave is similar, but zhiwenf is more efficient as he skips the intermediate step. I'll leave my solution up as it may be of interest as it generates a lookup table essentially, so you can not only count but also read the pairs off. But this answer should receive the bounty, good work zhiwenf. – Nir Friedman Jan 07 '15 at 16:28
  • 1
    This is indeed O(N). I tried a Python implementation of this and believe I got it correct - However this does not provide the same *answer* as the OP's function is the OP's function faulty? – wwii Jan 10 '15 at 19:12
  • @wwii The edge condition is incorrect. I have updated it and also test it by some cases. I should be right now. Please let me know if you find anything wrong:) thanks! – notbad Jan 10 '15 at 21:07
  • Using ```arr = [random.randint(-10, 10) for _ in xrange(1000)]; a = 6; b = 16```, your first version was under reporting and the second is over reporting. Perhaps [my Python rendition](https://gist.github.com/d6bf42ff33bb5388d40f.git) of your function is incorrect – wwii Jan 10 '15 at 23:24
  • @wwii Do you sort arr before you call the counting function? It seems that I can't open the link. – notbad Jan 10 '15 at 23:42
  • @wwii Line 9 of your code, it should be "arr[i] + arr[low] >= a" but "arr[i] + arr[high] >= a":) – notbad Jan 11 '15 at 01:51
7

The problem of counting the pairs that work can be done in sort time + O(N). This is faster than the solution that Ani gives, which is sort time + O(N log N). The idea goes like this. First you sort. You then run nearly the same single pass algorithm twice. You then can use the results of the two single pass algorithms to calculate the answer.

The first time we run the single pass algorithm, we will create a new array that lists the smallest index that can partner with that index to give a sum greater than a. Example:

a = 6
array = [-20, 1, 3, 4, 8, 11]
output = [6, 4, 2, 2, 1, 1]

So, the number at array index 1 is 1 (0 based indexing). The smallest number it can pair with to get over 6 is the eight, which is at index 4. Hence output[1] = 4. -20 can't pair with anything, so output[0] = 6 (out of bounds). Another example: output[4] = 1, because 8 (index 4) can pair with the 1 (index 1) or any number after it to sum more than 6.

What you need to do now is convince yourself that this is O(N). It is. The code is:

i, j = 0, 5
while i - j <= 0:
  if array[i] + array[j] >= a:
    output[j] = i
    j -= 1
  else:
    output[i] = j + 1
    i += 1

Just think of two pointers starting at the edges and working inwards. It's O(N). You now do the same thing, just with the condition b <= a:

while i-j <= 0:
  if array[i] + array[j] <= b:
    output2[i] = j
    i += 1
  else:
    output2[j] = i-1
    j-=1

In our example, this code gives you (array and b for reference):

b = 9
array = [-20, 1, 3, 4, 8, 11]
output2 = [5, 4, 3, 3, 1, 0]

But now, output and output2 contain all the information we need, because they contain the range of valid indices for pairings. output is the smallest index it can be paired with, output2 is the largest index it can be paired with. The difference + 1 is the number of pairings for that location. So for the first location (corresponding to -20), there are 5 - 6 + 1 = 0 pairings. For 1, there are 4-4 + 1 pairings, with the number at index 4 which is 8. Another subtlety, this algo counts self pairings, so if you don't want it, you have to subtract. E.g. 3 seems to contain 3-2 + 1 = 2 pairings, one at index 2 and one at index 3. Of course, 3 itself is at index 2, so one of those is the self pairing, the other is the pairing with 4. You just need to subtract one whenever the range of indices of output and output2 contain the index itself you're looking at. In code, you can write:

answer = [o2 - o + 1 - (o <= i <= o2) for i, (o, o2) in enumerate(zip(output, output2))]

Which yields:

answer = [0, 1, 1, 1, 1, 0]

Which sums to 4, corresponding to (1,8), (3,4), (4,3), (8, 1)

Anyhow, as you can see, this is sort + O(N), which is optimal.

Edit: asked for full implementation. Provided. For reference, the full code:

def count_ranged_pairs(x, a, b):
    x.sort()

    output = [0] * len(x)
    output2 = [0] * len(x)

    i, j = 0, len(x)-1
    while i - j <= 0:
      if x[i] + x[j] >= a:
        output[j] = i
        j -= 1
      else:
        output[i] = j + 1
        i += 1

    i, j = 0, len(x) - 1
    while i-j <= 0:
      if x[i] + x[j] <= b:
        output2[i] = j
        i += 1
      else:
        output2[j] = i-1
        j -=1

    answer = [o2 - o + 1 - (o <= i <= o2) for i, (o, o2) in enumerate(zip(output, output2))]
    return sum(answer)/2
Nir Friedman
  • 17,108
  • 2
  • 44
  • 72
  • 2
    Any chance you could throw together a full implementation? – Eric Jan 10 '15 at 00:37
  • Can you explain the logic behind `i - j <= 1`? Why do you allow your forward pointer, `i`, to overtake the backwards pointer, `j`, by 1? – Eric Jan 11 '15 at 13:31
  • This was just a bug that surprisingly didn't affect the outcome when I ran the code. – Nir Friedman Jan 12 '15 at 02:38
  • 1
    The more frequently used notation would seem to be `while i <= j`. Then, there is [loop jamming](http://en.wikipedia.org/wiki/Loop_jamming), variable names more suggestive than `i` and `j`, … – greybeard Jan 12 '15 at 08:24
  • I think the way I write the inequality is a nitpick, and i and j are common names for loop indices. Maybe not the best, but they're fine. This is SO and not code review, and this is still extremely readable code. As for loop jamming, I have no idea how to jam these two loops, if you know how please by all means edit my answer. – Nir Friedman Jan 12 '15 at 14:49
  • Should you return half the sum of answer to be consistent with OP's posted code? – wwii Jan 13 '15 at 14:38
  • Corrected. Thanks for fixing the variable names as well wwii. – Nir Friedman Jan 13 '15 at 14:49
5
from itertools import ifilter, combinations

def countpairs2(array, a, b):
    pairInRange = lambda x: sum(x) >= a and sum(x) <= b
    filtered = ifilter(pairInRange, combinations(array, 2))
    return sum([2 for x in filtered])

I think the Itertools library comes in quite handy. I also noticed you counted pairs twice, for example you counted (1, 3) and (3, 1) as two different combinations. If you don't want that, just change the 2 in the last line to a 1. Note: The last could be changed to return len(list(filtered)) * 2. This CAN be faster, but at the expense of using more RAM.

PVNRT
  • 235
  • 1
  • 4
  • 14
  • This is no better for time-complexity than the original answer, although admittedly clearer – Eric Jan 10 '15 at 00:04
  • _"for example you counted (1, 3) and (3, 1) as two different combinations"_ - That's not what _"ordered pairs"_ means – Eric Jan 10 '15 at 00:05
3

With some constraints on the data we can solve problem in linear time (sorry for Java, I'm not very proficient with Python):

public class Program {
    public static void main(String[] args) {
        test(new int[]{-2, -1, 0, 1, 3, -3}, -1, 2);
        test(new int[]{100,200,300}, 300, 300);
        test(new int[]{100}, 1, 1000);
        test(new int[]{-1, 0, 0, 0, 1, 1, 1000}, -1, 2);
    }

    public static int countPairs(int[] input, int a, int b) {
        int min = Integer.MAX_VALUE;
        int max = Integer.MIN_VALUE;
        for (int el : input) {
            max = Math.max(max, el);
            min = Math.min(min, el);
        }
        int d = max - min + 1; // "Diameter" of the array
        // Build naive hash-map of input: Map all elements to range [0; d]
        int[] lookup = new int[d];
        for (int el : input) {
            lookup[el - min]++;
        }
        // a and b also needs to be adjusted
        int a1 = a - min;
        int b1 = b - min;
        int[] counts = lookup; // Just rename
        // i-th element contain count of lookup elements in range [0; i]
        for (int i = 1; i < counts.length; ++i) {
            counts[i] += counts[i - 1];
        }
        int res = 0;
        for (int el : input) {
            int lo = a1 - el; // el2 >= lo
            int hi = b1 - el; // el2 <= hi
            lo = Math.max(lo, 0);
            hi = Math.min(hi, d - 1);
            if (lo <= hi) {
                res += counts[hi];
                if (lo > 0) {
                    res -= counts[lo - 1];
                }
            }
            // Exclude pair with same element
            if (a <= 2*el && 2*el <= b) {
                --res;
            }
        }
        // Calculated pairs are ordered, divide by 2
        return res / 2;
    }

    public static int naive(int[] ar, int a, int b) {
        int res = 0;
        for (int i = 0; i < ar.length; ++i) {
            for (int j = i + 1; j < ar.length; ++j) {
                int sum = ar[i] + ar[j];
                if (a <= sum && sum <= b) {
                    ++res;
                }
            }
        }
        return res;
    }

    private static void test(int[] input, int a, int b) {
        int naiveSol = naive(input, a, b);
        int optimizedSol = countPairs(input, a, b);
        if (naiveSol != optimizedSol) {
            System.out.println("Problem!!!");
        }
    }
}

For each element of the array we know the range in which second element of the pair can lay. Core of this algorithm is giving the count of elements in range [a; b] in O(1) time.

Resulting complexity is O(max(N, D)), where D is difference between max and min elements of the array. If this value is same order as N - complexity is O(N).

Notes:

  • No sorting involved!
  • Building lookup is required to make algorithm work with negative numbers and make second array as small as possible (positively impacts both memory and time)
  • Ugly condition if (a <= 2*el && 2*el <= b) is required because algorithm always counts pairs (a[i],a[i])
  • Algorithm requires O(d) additional memory which can be a lot.

Another linear algorithm would be radix sort + linear pair counting.

EDIT. This algorithm can be really good in case if D is considerably smaller than N and you are not allowed to modify the input array. Alternative option for this case would be slightly modified counting sort with allocation of counts array (additional O(D) memory) but without populating sorted elements back to input array. It's possible to adapt pair counting to use counts array instead of full sorted array.

Jarlax
  • 1,586
  • 10
  • 20
  • The problem frankly is that the problem statement says nothing about D. So the algorithmic complexity of your answer now has no upper bound in terms of N. In particular, you won't be able to put an upper bound on running time without looking at the entire input in detail, whereas the other algorithms only need the size of the array to put an upper bound. So in sum, this is a nice idea, but doesn't quite work out for the reasons I stated. Just thought I would give this comment in case you're wondering why you're not getting upvotes, despite this idea being neat (and it is!). – Nir Friedman Jan 12 '15 at 16:01
  • 1
    @NirFriedman Thanks! I've pointed out that complexity will be O(max(N, D)) and there is no guarantee on N in general case. In fact, JVM will not allow you to create array bigger than Integer.MAX_VALUE - 5 (if I'm not mistaken). For other platforms limits are similar - int.MaxValue for .NET, PY_SSIZE_T_MAX/sizeof(PyObject*) for Python etc. But with additional knowledge of your data algorithm still can be useful. For example - elements can represent money amounts or item weights or ages. – Jarlax Jan 12 '15 at 16:58
  • 1
    I do agree that domain knowledge can elevate this algorithm up to being the best of those here. That's part of why I +1'ed it :-) I'm not saying this algorithm is worse than any other, it is however less general, and since the user's question was general, it's therefore not as good an answer to the specific question at hand. – Nir Friedman Jan 12 '15 at 17:21
2

I have a solution(actually 2 solutions ;-)). Writing it in python:

def find_count(input_list, min, max):
    count = 0
    range_diff = max - min
    for i in range(len(input_list)):
        if input_list[i]*2 >= min and input_list[i]*2 <= max:
            count += 1
        for j in range(i+1, len(input_list)):
            input_sum = input_list[i] + input_list[j]
            if input_sum >= min and input_sum <= max:
                count += 2

This will run nCr(n combinations) times to the max and gives you the required count. This will be better than sorting the list and then finding the pairs in a range. If the number of elements that fail the combination is greater as well as all the numbers are positive integers, we can improve the result a little better by adding a condition that checks the elements for,

  • Numbers that do not fall under the range even with the addition of the max value
  • Numbers that are greater than the maximum number of the range.

Something like this:

# list_maximum is the maximum number of the list (i.e) max(input_list), if already known
def find_count(input_list, min, max, list_maximum):
    count = 0
    range_diff = max - min
    for i in range(len(input_list)):
        if input_list[i] > max or input_list[i] + list_maximum < min:
            continue
        if input_list[i]*2 >= min and input_list[i]*2 <= max:
            count += 1
        for j in range(i+1, len(input_list)):
            input_sum = input_list[i] + input_list[j]
            if input_sum >= min and input_sum <= max:
                count += 2

I will also be happy to learn any better solution than this :-) If i come across one, I will update this answer.

thiruvenkadam
  • 4,170
  • 4
  • 27
  • 26
1

I believe this is a simple math problem, that could be solved with numpy with no loops and no sorting on our part. I'm not exactly sure, but I believe the complexity to be O(N^2) at worse case (would love some confirmation on that by someone more knowledgeable with time complexities in numpy).

At any rate, here's my solution:

import numpy as np

def count_pairs(input_array, min, max):
    A = np.array(input_array)
    A_ones = np.ones((len(A),len(A)))
    A_matrix = A*A_ones
    result = np.transpose(A_matrix) + A_matrix
    result = np.triu(result,0)
    np.fill_diagonal(result,0)
    count = ((result > min) & (result < max)).sum()
    return count

Now let's walk through it - first I just create a matrix with columns representing our numbers:

A = np.array(input_array)
A_ones = np.ones((len(A),len(A)))
A_matrix = A*A_ones

Let's assume that our input array looked like: [1,1,2,2,3,-1],thus, this should be the value of A_matrix at this point.

[[ 1.  1.  2.  2.  3. -1.]
 [ 1.  1.  2.  2.  3. -1.]
 [ 1.  1.  2.  2.  3. -1.]
 [ 1.  1.  2.  2.  3. -1.]
 [ 1.  1.  2.  2.  3. -1.]
 [ 1.  1.  2.  2.  3. -1.]]

If I add that to the transpose of itself...

result = np.transpose(A_matrix) + A_matrix

...I should get a matrix representing all combinations of sums of pairs:

[[ 2.  2.  3.  3.  4.  0.]
 [ 2.  2.  3.  3.  4.  0.]
 [ 3.  3.  4.  4.  5.  1.]
 [ 3.  3.  4.  4.  5.  1.]
 [ 4.  4.  5.  5.  6.  2.]
 [ 0.  0.  1.  1.  2. -2.]]

Of course, this matrix is mirrored across the diagonal because the pairs (1,2) and (2,1) produce the same result. We don't want to consider these duplicate entries. We also don't want to consider the sum of an item with itself, so let's sanitize our array:

result = np.triu(result,0)
np.fill_diagonal(result,0)

Our result now looks like:

[[ 0.  2.  3.  3.  4.  0.]
 [ 0.  0.  3.  3.  4.  0.]
 [ 0.  0.  0.  4.  5.  1.]
 [ 0.  0.  0.  0.  5.  1.]
 [ 0.  0.  0.  0.  0.  2.]
 [ 0.  0.  0.  0.  0.  0.]]

All that remains is to count the items that pass our criteria.

count = ((result > min) & (result < max)).sum()

A word of caution:

This method won't work if 0 is in the acceptable domain, but I'm sure it would be trivial to manipulate that result matrix above to convert those 0's to some other meaningless number....

Quentin Donnellan
  • 2,687
  • 1
  • 18
  • 24
  • 8
    You generated matrices that are NxN, that immediately makes your algorithm O(N^2) no matter how trivial it is to generate your matrix. This solution is worse than the naive double for loop solution, as it uses more space. – Nir Friedman Jan 07 '15 at 18:49
  • 2
    You can replace `A_ones = np.ones((len(A),len(A)))` `A_matrix = A*A_ones` `result = np.transpose(A_matrix) + A_matrix` with `result = A + A[:,np.newaxis]` – Eric Jan 10 '15 at 00:40
  • The criteria is ```>=a``` and ```<=b```. Even with this correction, ```count = ((result >= _min) & (result <= _max)).sum()```, your function is returning one less than the OP's function using ```arr = [random.randint(-10, 10) for _ in xrange(1000)]; a = 6; b = 16``` – wwii Jan 10 '15 at 23:13
  • Hmm, now I cant reprouce the *one less* error, mea culpa. [Your solution refactored](https://gist.github.com/anonymous/0c64250a636a514a656f) – wwii Jan 10 '15 at 23:50
  • Great advice folks; for whatever reason I do enjoy exploring linear alg. solutions as alternatives to the typical loop'd solutions. Perhaps that's the MatLab in me :) . @wwii thanks for the refactor! – Quentin Donnellan Jan 11 '15 at 14:31
1

Rather than using the relational operators, we can simply check if the sum of array elements i and j are in the specified range.

def get_numOfPairs(array, start, stop):
    num_of_pairs = 0
    array_length = len(array)

    for i in range(array_length):
        for j in range(i+1, array_length):
            if sum([array[i], array[j]]) in range(start, stop):
                num_of_pairs += 1

    return num_of_pairs
Sjshovan
  • 310
  • 2
  • 5
0
n = int(input())
ar = list(map(int, input().rstrip().split()))[:n]
count=0
uniq=[]
for i in range(n):
    if ar[i] not in uniq:
        uniq.append(ar[i])
for j in uniq:
    if ((ar.count(j))%2==0):
        count=count+((ar.count(j))/2)
    if ((ar.count(j))%2!=0) & (((ar.count(j))-1)%2==0):
        count=count+((ar.count(j)-1)/2)
print(int(count))