How to find the kth largest element in the union of two sorted arrays?

Question

I need to find the k largest element in two sorted arrays, but with a twist.

This algorithm assumes k<=max(m,n) and indexes go awry when k>max(m,n). In my problem i know that will always be k>(m+n)/2 and hence k>min(m,n) so i need to change Jules Olléon's answer a little bit...i just don't see which bit :~

I've found this link page 3, but there is bug there (when implemented, it doesn't return the right answer)

I know a quick fix would be to multiply both arrays by -1 and take the k smallest of that union and multiply back the answer by -1 but that'll make the code unreadable.

This is not a homework.

Ok, i think i'm missunderstanding Neil's answer or something else because this is what i give 'him'

#include <algorithm>
#include <fstream>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>

#include <Eigen/Dense>
using namespace Eigen;
using Eigen::VectorXf;
using Eigen::VectorXi;

float getNth(VectorXf& v1,VectorXf& v2,int& n){
        int step=(n/4),i1=(n/2),i2=(n-i1);
        while(!(v2(i2)>=v1(i1-1) && v1(i1)>v2(i2-1))){                   
            if(v1(i1-1)>=v2(i2-1)){
                i1-=step;
                i2+=step;
            } else {
                i1+=step;
                i2-=step;
            }
            step/=2;
            if(!step) step=1;
        }
        if(v1(i1-1)>=v2(i2-1))
            return v1(i1-1);
            else
            return v2(i2-1);    
}
int main(){  
    int p,q,n,k,l;
    float sol;
    std:: cout << "enter p " << std::endl;
    std::cin >> p; 
    std:: cout << "enter q " << std::endl;
    std::cin >> q;
    n=p+q;
    std:: cout  << " enter k larger than " << std::min(p,q) << " and smaller than " << n-1 << std::endl;
    std::cin >> k;
    
    k=n-k-1;
    
    srand(time(NULL));
    VectorXf v1=VectorXf::Random(p);
    srand(time(NULL));
    VectorXf v2=VectorXf::Random(q);
    VectorXf v3(n);
    v3 << v1, v2;
    std::sort(v3.data(),v3.data()+v3.size(),std::greater<float>()); //std::greater<float>()
    std::sort(v1.data(),v1.data()+v1.size(),std::greater<float>());
    std::sort(v2.data(),v2.data()+v2.size(),std::greater<float>());
    
    sol=getNth(v1,v2,k);
    std::cout << sol << std::endl;
    std::cout << v3(k) <<   std::endl;
    return 0;  
}

and this is what i get:

enter p 
12
enter q 
32
 enter k larger than 12 and smaller than 43
13
nthoftwo: /Desktop/work/p1/geqw4/vi3/out/sp/ccode/eigen/Eigen/src/Core/DenseCoeffsBase.h:409: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::DenseCoeffsBase<Derived, 1>::Index) [with Derived = Eigen::Matrix<float, -0x00000000000000001, 1>, Eigen::DenseCoeffsBase<Derived, 1>::Scalar = float, Eigen::DenseCoeffsBase<Derived, 1>::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Aborted (core dumped)

if you are unfamiliar with eigen: the error is an index out of bound error caused by getNth(v1,v2,k)

EDIT:

this is a very minor modification on J.F. Sebastian' simple and elegant solution below --all mistakes are mine, but it seems to work. The aim was to work with the original indexes (i.e. i'm not sure Neil's idea is indispensable).

#include <algorithm>
#include <fstream>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
#include <cassert>
#include <iterator>

#include <Eigen/Dense>
using namespace Eigen;
using Eigen::VectorXf;
using Eigen::VectorXi;

template<class RandomIterator,class Compare>
typename std::iterator_traits<RandomIterator>::value_type
nsmallest(RandomIterator firsta,RandomIterator lasta,RandomIterator firstb,RandomIterator lastb,size_t n,Compare less) {
  assert(n<static_cast<size_t>((lasta-firsta)+(lastb-firstb)));
  if (firsta==lasta) return *(firstb+n);
  if (firstb==lastb) return *(firsta+n);

  size_t mida=(lasta-firsta)/2;
  size_t midb=(lastb-firstb)/2;
  if ((mida+midb)<n)
    return less(*(firstb+midb),*(firsta+mida))?
      nsmallest(firsta,lasta,firstb+midb+1,lastb,n-(midb+1),less):
      nsmallest(firsta+mida+1,lasta,firstb,lastb,n-(mida+1),less);
  else
    return less(*(firstb+midb),*(firsta+mida))?
      nsmallest(firsta,firsta+mida,firstb,lastb,n,less):
      nsmallest(firsta,lasta,firstb,firstb+midb,n,less);
}
int main(){  
    int p,q,n,k,l;
    float sol;
    std:: cout << "enter p " << std::endl;
    std::cin >> p; 
    std:: cout << "enter q " << std::endl;
    std::cin >> q;
    n=p+q;
    std:: cout  << " enter k larger than " << std::min(p,q) << " and smaller than " << n-1 << std::endl;
    std::cin >> k;
    
    srand(time(NULL));
    VectorXf v1=VectorXf::Random(p);
    srand(time(NULL));
    VectorXf v2=VectorXf::Random(q);
    VectorXf v3(n);
    v3 << v1, v2;
    std::sort(v3.data(),v3.data()+v3.size()); 
    std::sort(v1.data(),v1.data()+v1.size());
    std::sort(v2.data(),v2.data()+v2.size());
    
    sol=nsmallest(v1.data(),v1.data()+v1.size(),v2.data(),v2.data()+v2.size(),k,std::less<float>());
//if it works, these two should return the same.
    std::cout << sol << std::endl;  
    std::cout << v3(k) << std::endl;
    return 0;  
}

score 6 · Accepted Answer · edited May 23 '17 at 12:34

From your comments I understand that you want to find k-th smallest value given 2 inversely sorted array e.g., for a={5,4,3}, b={2,1,0}; and k=1 the expected result is 0 i.e., the minimum value -- the first smallest value (it means that k is counted from 1).

Given nsmallest() function that works on sorted arrays and accepts a custom comparator you could:

#include <functional> // greater<>
#include <iostream>

#define SIZE(a) (sizeof(a) / sizeof(*a))

int main() {
  int a[] = {5,4,3};
  int b[] = {2,1,0};
  int k = 1; // find minimum value, the 1st smallest value in a,b

  int i = k - 1; // convert to zero-based indexing
  int v = nsmallest(a, a + SIZE(a), b, b + SIZE(b),
            SIZE(a)+SIZE(b)-1-i, std::greater<int>());
  std::cout << v << std::endl; // -> 0
  return v;
}

I've used @Neil's suggestion to fix the index and @lambdapilgrim's answer for the algorithm:

#include <cassert>
#include <iterator>

template<class RandomIterator, class Compare>
typename std::iterator_traits<RandomIterator>::value_type
nsmallest(RandomIterator firsta, RandomIterator lasta,
          RandomIterator firstb, RandomIterator lastb,
          size_t n,
          Compare less) {
  assert(n < static_cast<size_t>((lasta - firsta) + (lastb - firstb)));
  if (firsta == lasta) return *(firstb + n);
  if (firstb == lastb) return *(firsta + n);

  size_t mida = (lasta - firsta) / 2;
  size_t midb = (lastb - firstb) / 2;
  if ((mida + midb) < n)
    return less(*(firstb + midb), *(firsta + mida)) ?
      nsmallest(firsta, lasta, firstb + midb + 1, lastb, n - (midb + 1), less) :
      nsmallest(firsta + mida + 1, lasta, firstb, lastb, n - (mida + 1), less);
  else
    return less(*(firstb + midb), *(firsta + mida)) ?
      nsmallest(firsta, firsta + mida, firstb, lastb, n, less) :
      nsmallest(firsta, lasta, firstb, firstb + midb, n, less);
}

wao: such an elegant and robust solution: this *is* awesome :), you have much improved the original implementation. Thank you very much sire. I've slightly modified you code to work with the original *k* --see edit to question. — user189035, Jul 27 '12 at 07:41
@user189035: the nice thing about the above code is that it is a tail-recursive function. Some compilers can perform tail-call optimization (TCO). [A manually written iterative version](http://stackoverflow.com/a/11698659/4279) is only marginally faster than the above recursive variant. — jfs, Jul 28 '12 at 05:57
Correct me if I am wrong - mida should be (firsta+lasta)/2...similarly mid2 — DJ', Sep 04 '13 at 07:59
@DJ': The difference of two iterators can be interpreted as a distance between them. How do you interpret their sum? The sum would probably produce a compilation error (`a+b` is not defined if both `a` and `b` are iterators). — jfs, Sep 04 '13 at 08:55
My bad- I assumed them to be the index positions in the arrays — DJ', Sep 07 '13 at 00:01

score 4 · Answer 2 · answered Jul 26 '12 at 23:38

4

The kth largest element is also the m + n + 1 - kth* smallest element so you could try solving the problem that way.

*Counting from 1. If k is counting from 0, use m + n - 1 - k instead.

answered Jul 26 '12 at 23:38

Neil

54,642
8
60
72

what you propose is basically the solution i wrote in the fourth paragraph of my question. I'm not saying it's bad...i just hope there is a simpler way :) – user189035 Jul 26 '12 at 23:42
@user189035: no, your solution involves multiplying by -1, Neil's answer doesn't – jfs Jul 27 '12 at 00:12
@J.F.Sebastian but the code i linked to assume the observations are sorted in increasing order....if you don't multiply by -1 but sort in decreasing order instead it doesn't work; back at square one. – user189035 Jul 27 '12 at 00:40
@user189035: could you provide an example `a`, `b`, `k` such that `nsmallest(a,b,m+n-1-k) != nlargest(a,b,k)` where `a`,`b` sorted and `0 <= k < (len(a) + len(b))`, note: `nsmallest(a,b,k) == sorted(a+b)[k]`, `nlargest(a, b, k) == sorted(a+b,reverse=True)[k]`. – jfs Jul 27 '12 at 01:03
that's not really where i have a problem...the problem is that if you place the a,b in descending order, the getsmallest breaks down... – user189035 Jul 27 '12 at 01:24
@user189035: given `a={5,4,3}, b={2,1,0}; and k=1` what do you expect to get? – jfs Jul 27 '12 at 01:31
{0} ie b(0). but, really, my problem is with the indexes striclty inside min(m,n),n-1 the extremes cases i will write the two if/then for them latter. Do you see what the problem is? – user189035 Jul 27 '12 at 01:43
@user189035: you might mean `b[2] == 0` because `b[0] == 2` and it is neither the k-th largest nor smallest value (`k==1`). – jfs Jul 27 '12 at 04:24

score 0 · Answer 3 · answered Jul 26 '12 at 23:29

I believe you want something analogous to the merge step of mergesort, where you incrementally compare the ith element of m against the jth element of n - but instead of stashing the values in an array, you're just looking for the kth smallest - so when you find it, return that value (and/or its index) and exit the function.

score 0 · Answer 4 · answered Jul 27 '12 at 03:34

0

I am not sure what the deal is about k>max(m,n)!

A simple solution:

def find(v1, start1, end1, v2, start2, end2, k):
    i = (start1+end1)/2
    j = binsearchrank(v2, start2, end2, v1[i])
    ranki = (i-start1+1) + (j-start2)
    if ranki > k:
        return find(v2, start2, j, v1, start1, i, k)
    elif ranki < k:
        return find(v2, j, end2, v1, i+1, end1, k-ranki)
    else:
        return v1[i]

Complexity is O(log^2n)

answered Jul 27 '12 at 03:34

ElKamina

7,747
28
43

It may simply be that i'm not a programmer and these things are more complicated to an untrained mind. To come back to your proposal -- what is binsearchrank() --from your O() number i assume it is binary search, right? --i'm a bit confused by the "rank" appended to the name-- Also, this will typically not be important in general, but just to point out that in my case log(n) is>10, and J.F. Sebastian's solution is O(log(n)) :) --a factor 10 of difference is problematic because this is part of an application that spends half its time on this finding part. – user189035 Jul 27 '12 at 07:56
1

@user189035 Yes. binsearchrank does return rank. You are right. O(logn) is much better for your case and you should use it. – ElKamina Jul 27 '12 at 08:13

How to find the kth largest element in the union of two sorted arrays?

EDIT:

4 Answers4

Linked