18

I am looking for the solution of following algorithm with minimal time and space complexity.

Given two arrays a and b, find all pairs of elements (a1,b1) such that a1 belongs to Array A and b1 belongs to Array B whose sum a1+b1 = k (any integer).

I was able to come up with O(n log n) approach where we will sort one of the array say A and for each of the element b in array B, do binary search on sorted array A for value (K-b) .

Can we improve it any further?

TopCoder
  • 4,206
  • 19
  • 52
  • 64
  • This is likely to be the fastest generic solution you can get, since the constant factor for seeking `n` times in a sorted array is lower than for sorting an unsorted array of length `n`, typically. So you probably only want to sort one array (the short one). – Rex Kerr Sep 28 '10 at 17:25
  • If `a = {3, 3}`, `b = {7, 7}`, and `k = 10`, what is the expected output? – Arun Sep 29 '10 at 05:04
  • http://stackoverflow.com/questions/19175221/sum-of-two-integers-in-an-array-equals-k/24149313#24149313 – Nikhil Rupanawar Jun 10 '14 at 19:40

5 Answers5

18

If the arrays are sorted you can do it in linear time and constant storage.

  • Start with two pointers, one pointing at the smallest element of A, the other pointing to the largest element of B.
  • Calculate the sum of the pointed to elements.
  • If it is smaller than k increment the pointer into A so that it points to the next largest element.
  • If it is larger than k decrement the pointer into B so that it points to the next smallest element.
  • If it is exactly k you've found a pair. Move one of the pointers and keep going to find the next pair.

If the arrays are initially unsorted then you can first sort them then use the above algorithm. There a few different approaches for sorting them that you could use, depending on the type of data you expect:

A comparison sort will require O(n log n) time on average. The last two are faster than O(n log(n)) but can be impractical if the range of possible values in the input arrays is very large.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Does that find all pairs, or just the first? – Platinum Azure Sep 28 '10 at 16:59
  • Once you have find one, you continue by increasing the pointer in A and the process continues until you've found an other pair. – Loïc Février Sep 28 '10 at 17:01
  • I had this solution in mind but even with two pointers i cannot avoid iterating over both the arrays completely , as i need all pairs for which condition hold true .Am i correct ? – TopCoder Sep 28 '10 at 17:01
  • If it is sorted then each pointer will always go forward (A) or backward (B) : 2*n moves maximum, so it's linear. – Loïc Février Sep 28 '10 at 17:03
  • @Mark Byers: Thanks for making it more clear with your edit. I suppose I just misunderstood (in the first version you never specified what to do after finding the pair, and I assumed print and terminate since you didn't specify explicit instructions). My apologies. I didn't downvote you, so no hard feelings I hope. – Platinum Azure Sep 28 '10 at 17:36
  • 1
    Its seems intuitive, but can you prove the correctness of this algorithm? – damned Aug 06 '12 at 12:01
  • Good answer. Also, hi Mark :-) – Jonas Kölker Apr 05 '16 at 20:30
  • If `a[i] + b[j] = k`, couldn't you update *both* pointers? It's not like `a[i] + b[j-1] = k` unless `b[j-1] = b[j]`, and in that case, if also `a[i] = a[i+1]`, what are you supposed to output? More concretely, if `a = [1, 1]` and `b = [2, 2]` and `k = 3`, should your output have length four? If so, run-length encode all runs of equal values to guarantee distinct elements. It adds complication, though: now you have to do a bit of arithmetic to figure out the index is the original array if you want to output e.g. `[(0, 0), (0, 1), (1, 0), (1, 1)]` (in my example). – Jonas Kölker Apr 05 '16 at 20:36
  • More to the point: in my example, no matter which pointer you bump first in case of equality, you will leave out one pair in your output. I'm intuiting that you can get away with only run-length encoding one array and bumping the pointer that points into the other, but I'm too tired to give a formal proof at the moment. – Jonas Kölker Apr 05 '16 at 21:13
  • You should incorporate @Jonas improvement. Also, the question does not stipulate unique elements, but your answer does. At least that's trivially fixed. – Deduplicator May 23 '20 at 15:29
11

If you have a limit on the maximum possible number (let's name it M) then you can have a solution in O(M+n).

Boolean array of false and mark as true all value for element of A. Then for each element b of B check if the element number K-b is marked as true.

You can improve it if you are using an hash-map instead of a big array. But I would not consider that in this kind of questions hash-map is kind of cheating.

Anyway it would give you O(n) for insertion and then O(n) for query, O(n) in total.

EDIT :

One case where this might be useful.

  • You have un-sorted vectors of size 10^6, so sorting them and doing the match is in O(n log n) with n = 10^6.
  • You need to do this operation 10^6 times (different vectors), complexity of O(n*n*log n).
  • Maximum value is 10^9.

Using my idea not with boolean but integer (incremented at each run) gives you a complexity of :

  • "O(10^9)" to create the array (also same complexity of space)
  • O(n) at each run, so O(n*n) for the total.

You are using more space but you've increased speed by a factor log(n) ~=20 in this case !

Loïc Février
  • 7,540
  • 8
  • 39
  • 51
  • +1 for the first solution which will work . But space complexity is a function of M ( maximum possible number ) and not n , allocating this much space in real world is not acceptable. – TopCoder Sep 28 '10 at 17:10
  • Yes, I know. But it depends of what M is. If all the values are between 1 and 1000 but you have 1000 000 of them in each array (un-sorted) then using it can be efficient. Depends of the problem ;) – Loïc Février Sep 28 '10 at 17:13
  • Why would you consider hashtable - cheating? You can write ad hoc implementation in 10 minutes (if no standard available) and get linear time and space complexity. Seems like the optimal solution. Anyway, +1 for mentioning it. – Nikita Rybak Sep 28 '10 at 18:13
  • Hash table is "cheating" because it does not improve big O, but seems to improve it to naive audiences due to average/typical cases. – R.. GitHub STOP HELPING ICE Jan 31 '12 at 06:48
3

I would create a hash table containing the elements of one array, then iterate the other array looking up k - a(n), generating an output element if the lookup succeeded. This will use O(n) space and time.

In C#, it might look like this:

var bSet = new HashSet(B);
var results = from a in A
              let b = k - a
              where bSet.Contains(b)
              select new { a, b };
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Gabe
  • 84,912
  • 12
  • 139
  • 238
1
template< typename T >
std::vector< std::pair< T, T > >
find_pairs( 
    std::vector< T > const & a, std::vector< T > const & b, T const & k  ) {

    std::vector< std::pair< T, T > > matches;

    std::sort( a.begin(), a.end() );  // O( A * lg A )
    std::sort( b.begin(), b.end() );  // O( B * lg B )

    typename std::vector< T >::const_iterator acit = a.begin();
    typename std::vector< T >::const_reverse_iterator bcit = b.rbegin();

    for( ; acit != a.end(); /* inside */ ) {
        for( ; bcit != b.rend(); /* inside */ ) {

            const T sum = *acit + *bcit;

            if( sum == k ) {
                matches.push_back( std::pair< T, T >( *acit, *bcit ) );
                ++acit;
                ++bcit;
            }
            else if( sum < k ) {
                ++acit;
            }
            else {  // sum > k
                ++bcit;
            }
        }
    }  // O( A + B )
    return matches;
}
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Arun
  • 19,750
  • 10
  • 51
  • 60
0

I used C++ and it seemed to give me the desired result. Hope this is what you were looking for.

using namespace std;

using data=std::pair<int,int>;

void search_pairs(std::vector<int>& A, std::vector<int>& B, const int total, std::vector<data>& output){

  std::sort(A.begin(),A.end(),[](const int i,const int j)->bool{return (i<j);});
  std::sort(B.begin(),B.end(),[](const int a,const int b)->bool{return (a<b);});
  std::vector<int>* minV(nullptr);
  std::vector<int>* maxV(nullptr);
  if(A.size()>B.size()) {minV=&B;maxV=&A;}
  else {minV=&A;maxV=&B;}
  for(auto&& itr:(*minV) ){
    auto remain(total-itr);
    if (std::binary_search (maxV->begin(), maxV->end(), remain)){
      data d{itr,remain};
      if (minV==&B) std::swap(d.first,d.second);
      output.push_back(d);
    }
  }
  if (minV==&B) std::reverse(output.begin(),output.end());  
}

int main() {

    size_t nb(0);
    scanf("%lu",&nb);
    for (size_t i=0;i<nb;++i){
        size_t a,b(0);
        int total(0);
        scanf("%lu %lu %d",&a,&b,&total);
        std::vector<int> A,B;
        for (size_t i=0;i<a;++i){
            int aux;
            scanf("%d",&aux);
            A.push_back(aux);
        } 
        for (size_t i=0;i<b;++i){
            int aux;
            scanf("%d",&aux);
            B.push_back(aux);
        } 
        std::vector<data> output;
        search_pairs(A, B, total, output);
    auto itr=std::begin(output);
    if (itr==std::end(output)) printf("-1"); 
    while (itr!=std::end(output)){
      printf("%d %d",(*itr).first, (*itr).second);
      if (++itr!=std::end(output)) printf(", ");
    }
        printf("\n");
    }
    //code
    return 0;
}
Deduplicator
  • 44,692
  • 7
  • 66
  • 118