7

I have this question:

Given two sorted lists (stored in arrays) of size n, find an O(log n) algorithm that computes the nth largest element in the union of the two lists.

I can see there is probably a trick here as it requires the nth largest element and the arrays are also of size n, but I can't figure out what it is. I was thinking that I could adapt counting sort, would that work?

asheeshr
  • 4,088
  • 6
  • 31
  • 50
Joe
  • 365
  • 5
  • 11
  • How does union of 2 lists work? – nhahtdh Jan 26 '13 at 10:15
  • Are the arrays disjoint? If not, I'm pretty sure it can't be done below `Th(n)` (and it's trivial at `Th(n)`) – John Dvorak Jan 26 '13 at 10:18
  • 2
    @Gumbo: If the array size is > n, and assuming that "union of 2 lists" just mean putting all of them together, then we can always trim the array to size n (due to the sorted property). – nhahtdh Jan 26 '13 at 10:20
  • How do you do it then? I can't see how it can be done without sorting the union, which means it would be O(nlog n) – Joe Jan 26 '13 at 10:21
  • You can get O(n) easily given both lists are initially sorted – lc. Jan 26 '13 at 10:22
  • 1
    `O(n)` is the naive way: Just compare the smallest element from 2 list, advance the pointer and count. – nhahtdh Jan 26 '13 at 10:23
  • 2
    **If** the arrays are _sorted_ and _disjoint_, **then** it might be possible in `O(log n)` (I'm thinking in terms of a binary search). Otherwise - not a chance. – John Dvorak Jan 26 '13 at 10:23
  • 1
    @Joe: Can you answer my first question? We are confused whether `union` here means union of set, which removes duplicate element, or is it simply put 2 lists together? – nhahtdh Jan 26 '13 at 10:24
  • @nhahtdh union of two multisets usually means that the larger of both counts are used. If the counts are added, then I would call the operation "merge", not "(multiset) union". – John Dvorak Jan 26 '13 at 10:26
  • @nhahtdh That is confusing me as well. I would assume duplicates are included. – Joe Jan 26 '13 at 10:28
  • @Jan Dvorak Could you give me a hint on how binary search might be used? – Joe Jan 26 '13 at 10:28
  • @Joe with duplicates it's provably impossible under `O(k)` to find the k-th largest elements. Without duplicates (or with a merge instead of union), it might be possible (but I don't have a concrete algorithm either). – John Dvorak Jan 26 '13 at 10:31
  • @Joe basically - search in both lists in parallel, but not by a value but rather by a relation (in values and indexes). I don't know if that's a viable algorithm, but I'll explore. – John Dvorak Jan 26 '13 at 10:32
  • @Jan Dvorak Okay then, some I guess it doesn't include duplicates. Could I maybe modify merge sort or binary search to get what I want? – Joe Jan 26 '13 at 10:32
  • @Joe: With duplicate, I can do (log n)^2. – nhahtdh Jan 26 '13 at 10:33
  • @Joe merge sort is `O(k)`, even `Theta(k)` – John Dvorak Jan 26 '13 at 10:33
  • @JanDvorak: It *can* be done in O(log(n)) See [How to find the kth smallest element in the union of two sorted arrays?](http://stackoverflow.com/a/8935157/4279) – jfs Jan 26 '13 at 10:36
  • possible duplicate of [How to find the kth smallest element in the union of two sorted arrays?](http://stackoverflow.com/questions/4607945/how-to-find-the-kth-smallest-element-in-the-union-of-two-sorted-arrays) – nhahtdh Jan 26 '13 at 10:39
  • @J.F.Sebastian this answer assumes that the arrays are disjoint or the union is a merge. – John Dvorak Jan 26 '13 at 11:34
  • @JanDvorak: I don't see any restrictions. Could you provide a counter-example? – jfs Jan 26 '13 at 12:10
  • @J.F.Sebastian `{1,2,3} ⋃ {1,2,3} = {1,2,3}`. The third largest element is `1`, but the algorithm claims it's `2`, seeing the union as `{1,1,2,2,3,3}`. – John Dvorak Jan 26 '13 at 12:12
  • @JanDvorak: Given description in the question, the second interpretation (without removing duplicates) is the correct one. Though the title along due to the word "number" instead of "element" is ambiguous. – jfs Jan 26 '13 at 12:37

3 Answers3

8

Compare A[n/2] and B[n/2]. If equal, any of them is our result. Other stopping condition for this algorithm is when both arrays are of size 1 (either initially or after several recursion steps). In this case we just choose the largest of A[n/2] and B[n/2].

If A[n/2] < B[n/2], repeat this procedure recursively for second half of A[] and first half of B[].

If A[n/2] > B[n/2], repeat this procedure recursively for second half of B[] and first half of A[].

Since on each step the problem size is (in worst case) halved, we'll get O(log n) algorithm.


Always dividing array size by two to get the index works properly only if n is a power of two. More correct way of choosing indexes (for arbitrary n) would be using the same strategy for one array but choosing complementing index: j=n-i for other one.

Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • 2
    How would you take care of the case {1}, {2}? – nhahtdh Jan 26 '13 at 10:37
  • Ok, I see that. Pretty obvious now you mention it. Thanks for your help everyone. – Joe Jan 26 '13 at 10:42
  • 1
    Like nhahtdh mentionned, you got the main idea in the answer but you must keep a track of how many elements you have discarded so far to know how to split the list when you have uneven lengths and finally pick up the good one when the remaining lists consist of 1 single value. – Rerito Jan 26 '13 at 10:46
  • Updated answer clarifies this algorithm according to suggestions by nhahtdh and Rerito. – Evgeny Kluev Jan 26 '13 at 11:00
  • always [dividing the array size by 2 works in general case](http://stackoverflow.com/a/11681092/4279). It is not necessary for `n` to be power of two. – jfs Jan 26 '13 at 11:04
  • Note that this doesn't work if the arrays are not disjoint and the union is a union in the set-union sense. – John Dvorak Jan 26 '13 at 11:35
  • @Evgeny Kluev I don't think this algorithm works for {1, 3, 8, 20, 30, 31} and {2, 4, 6, 21, 29, 32}. The 6th largest number is 8. But the algorithm will compare 8 and 6 in its first step and continue to look at {1,3} and {21,29,32} in which case 8 will never be reconsidered. – allenylzhou May 23 '14 at 05:03
  • @allenylzhou: in your example second step should process arrays {1, 3, 8} and {21,29,32} (half of length-6 array is definitely a length-3 array), so 8 will be reconsidered and finally found. Still your concerns are valid. I didn't tell how exactly indexes should be calculated. But in correct implementation we need to do it very carefully, to avoid off-by-one errors. – Evgeny Kluev May 23 '14 at 08:44
2

Evgeny Kluev gives a better answer - mine was O(n log n) since i didn't think about them as being sorted.

what i can add is give you a link to a very nice video explaining binary search, courtesy of MIT:

https://www.youtube.com/watch?v=UNHQ7CRsEtU

  • Putting 2 sorted list together into a sorted list will take O(n). – nhahtdh Jan 26 '13 at 10:34
  • he can choose not to concat them and iterate between the two. aswell as view this in terms of amortization. it depends on how you implement this. he doesn't know if the lists contain repetitions which they might. it could be that i'm still mistaken though, perhaps he doesn't need to concat them at all. – Shokodemon Jan 26 '13 at 10:42
  • i see now that my logic was O(n log n), i'll edit my post to reflect this. – Shokodemon Jan 26 '13 at 10:48
  • @Shokodemon: You know that O(n log n) is worse than the normal way of going through both array, which yield at most O(n). – nhahtdh Jan 26 '13 at 10:49
  • yes, my fault for not reading the question well enough, apologies. – Shokodemon Jan 26 '13 at 11:11
1
public static void main(String[] args) {  


int[] fred = { 60, 5, 7, 3, 20, 3, 44 };  

int[] tmp = new int[fred.length];  
go(fred, 1, tmp, 3);  
}  

public static void go(int[] fred, int cnt, int[] tmp, int whatPosition) {  
int max = 0;  
int currentPosition = 0;  
for (int i = 0; i < fred.length; i++) {  
if (i == 0)  
max = fred[i];  
else {  
if (fred[i] > max) {  
max = fred[i];  
currentPosition = i;  
}  
}  
}  
System.arraycopy(fred, 0, tmp, 0, fred.length);  
tmp[currentPosition] = 0;  
cnt++;  
if(cnt != whatPosition)  
go(tmp, cnt, tmp, whatPosition);  
else{  
for (int i = 0; i < tmp.length; i++) {  
if (i == 0)  
max = tmp[i];  
else {  
if (tmp[i] > max) {  
max = tmp[i];  
}  
}  
}  
System.out.println(max);  
}  




}  
Madara
  • 150
  • 1
  • 8