0

I have an array containing a big number of elements—more than 2,000,000. I need to obtain the highest (or lowest) ranking 300 elements. So whenever I reach the first highest (or lowest) 300 elements of the array, return them. Currently Arrays.sort is used for the whole array, then the highest (or lowest) ranking elements are returned.

e.g.: {1,2,3,4,5,6,7,8,9} I want to obtain the highest 3 elements; I need: {9,8,7}

Any suggestions on this one?

EDIT

Best resource found so far, containing a research/comparison of different solutions:

http://www.michaelpollmeier.com/selecting-top-k-items-from-a-list-efficiently-in-java-groovy/

Source code for the article:

https://github.com/mpollmeier/Selection-Algorithms

Olimpiu POP
  • 5,001
  • 4
  • 34
  • 49

3 Answers3

4

You can use partial heapsort. Construct a minheap having 1st 300 elements.

Then as you traverse the array further, check whether the current element is greater than the root element of the heap. If it is greater, then delete the root element and add this new element.

After you finish with the entire array, your minHeap will have the largest 300 elements.

Extract the root element one by one. The elements will pop out in ascending order.

Note: The heap will always contain k(300) elements irrespective of the value of N, so heap operations shall be O(logk) in this case.

Hence order of complexity of this algorithm is O(Nlogk) where N is the size of array.

Space Complexity - O(k)

EDIT: If you want the lowest 300 elements, then similar algorithm can be followed using a maxheap instead of minheap.

Abhishek Bansal
  • 12,589
  • 4
  • 31
  • 46
  • 2
    it's `n log n` i reckon, not `n`; space is `n` because you can sort in place – guido May 28 '14 at 15:32
  • @guido The minHeap only has 300 elements. So heap operations in this case is constant time. – Abhishek Bansal May 28 '14 at 15:32
  • you need to look at the worst case, which is n log n; insertion in an heap is log n – guido May 28 '14 at 15:33
  • @guido Why do you say it is nlogn? Insertion in a heap of constant size is constant. – Abhishek Bansal May 28 '14 at 15:35
  • So if the heap becomes 1k or 10k or 100k or 1m constant size every operation will be constant too? – Luiggi Mendoza May 28 '14 at 15:40
  • @AbhishekBansal you have to scan n elements; potentially you need to insert each of them in the queue, each insertion is log n – guido May 28 '14 at 15:42
  • @LuiggiMendoza By constant, I mean not increasing with N, the size of the array. The operation shall be Nlogk if you consider k is also veriable. I have edited my answer. – Abhishek Bansal May 28 '14 at 15:43
  • And I'm asking if your heap is 1m constant size, all the operations will be O(1)? – Luiggi Mendoza May 28 '14 at 15:43
  • @LuiggiMendoza Yes, IMO asymptotically, the operations shall be O(1) if the heap size constant and is not increasing with N. Am I wrong? – Abhishek Bansal May 28 '14 at 15:45
  • The fact you know the value of N doesn't mean the operations would become O(1) instantly. – Luiggi Mendoza May 28 '14 at 15:46
  • 1
    @LuiggiMendoza I'm sorry I genuinely can't see where I am wrong. I am not saying you know the value of N (the size of array). But you know the number of top elements that you want (300 or k). Since the size of heap is always k (not N), the operations shall be log(k) or constant if k is constant. – Abhishek Bansal May 28 '14 at 15:50
  • This answer is correct. The confusion in the comments appears to be terminology. If the list has `n` items and you want to select `k` of them (i.e. `n=2,000,000` and `k=300` as in the question), then the complexity is `O(n log k)`. The heap contains `k` items, and in the worst case you have to do `n` insertions (and `n-k` removals). Insertion is `O(log k)`, and you have to do it `n` times. Thus, `O(n log k)`. – Jim Mischel May 29 '14 at 13:27
0

Would this work for you? This example sorts the top 4 elements in the array:

    double[] arr = new double[]{1.0,4.0,2.0,8.0,3.0,6.0,7.0,5.0};

    int nth = 4; //e.g. - sort the top 4 numbers
    Arrays.sort(arr,arr.length-nth-1,arr.length);
    System.out.println(Arrays.toString(arr));

Output:

[1.0, 4.0, 2.0, 3.0, 5.0, 6.0, 7.0, 8.0]
Edwin Torres
  • 2,774
  • 1
  • 13
  • 15
-1

Use (T[], int, int, java.util.Comparator), this will sort the given array (T[]) in the range specified, only the elements from the first int arg to the second int arg. The comparator is optional.

BusyProgrammer
  • 2,783
  • 5
  • 18
  • 31
DirkyJerky
  • 1,130
  • 5
  • 10