5

I am trying to find top 4 maximum value from integer array input. For example for given input array {1232, -1221, 0, 345, 78, 99} will return {1232, 345, 99, 78} as a top 4 maximum value. I have solved the requirement with following method below. But I am still not satisfy with its time efficiency. Is there any chance to optimize the method more as the input become larger? Any clues are really appreciated. Thank you.

public int[] findTopFourMax(int[] input) {
int[] topFourList = { Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE,       Integer.MIN_VALUE };
for (int current : input) {
    if (current > topFourList[0]) {
        topFourList[3] = topFourList[2];
        topFourList[2] = topFourList[1];
        topFourList[1] = topFourList[0];
        topFourList[0] = current;
    } else if (current > topFourList[1]) {
        topFourList[3] = topFourList[2];
        topFourList[2] = topFourList[1];
        topFourList[1] = current;
    } else if (current > topFourList[2]) {
        topFourList[3] = topFourList[2];
        topFourList[2] = current;
    } else if (current > topFourList[3]) {
        topFourList[3] = current;
    }
}
return topFourList;

}

Jon Kartago Lamida
  • 827
  • 1
  • 7
  • 12

6 Answers6

13

Simplest (though not most efficient) way will be to sort the array at take the subarray containing the last 4 elements.

You can use Arrays.sort() to sort and Arrays.copyOfRange() to take the subarray.

int[] arr = new int[] {1232, -1221, 0, 345, 78, 99};
Arrays.sort(arr);
int[] top4 = Arrays.copyOfRange(arr, arr.length-4,arr.length);
System.out.println(Arrays.toString(top4));

For more efficient solution, one can maintain a min-heap of top K elements or use selection algorithm to find the top 4th element. The two approaches are described in this thread.

Though the selection algorithm offers O(n) solution, the min-heap solution (which is O(nlogK)) should have better constants, and especially for small k is likely to be faster.

P.S. (EDIT):

For 4 elements, you might find that invoking a loop 4 times, and finding a max in each of them (and changing the old max to -infinity in each iteration) will be more efficient then the more "complex" approaches, since it requires sequential reads and have fairly small constants. This is of course not true for larger k, and decays into O(n^2) for k->n


EDIT2: benchmarking:

for the fun of it, I ran a benchmark on the attached code. The results are:

[naive, sort, heap] = [9032, 214902, 7531]

We can see that the naive and heap are much better then the sort based approach, and the naive is slightly slower then the heap based. I did a wilcoxon test to check if the difference between naive and heap is statistically significant, and I got a P_Value of 3.4573e-17. This means that the probability of the two approaches are "identical" is 3.4573e-17 (extremely small). From this we can conclude - heap based solution gives better performance then naive and sorting solution (and we empirically proved it!).

Attachment: The code I used:

public static int[] findTopKNaive(int[] arr, int k) {
    int[] res = new int[k];
    for (int j = 0; j < k; j++) { 
        int max=Integer.MIN_VALUE, maxIdx = -1;
        for (int i = 0; i < arr.length; i++) { 
            if (max < arr[i]) { 
                max = arr[i];
                maxIdx = i;
            }
        }
        arr[maxIdx] = Integer.MIN_VALUE;
        res[k-1-j] = max;
    }
    return res;
}

public static int[] findTopKSort(int[] arr, int k) { 
    Arrays.sort(arr);
    return Arrays.copyOfRange(arr, arr.length-k,arr.length);
}

public static int[] findTopKHeap(int[] arr, int k) { 
    PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
    for (int x : arr) { 
        if (pq.size() < k) pq.add(x);
        else if (pq.peek() < x) {
            pq.poll();
            pq.add(x);
        }
    }
    int[] res = new int[k];
    for (int i =0; i < k; i++) res[i] = pq.poll();
    return res;

}
public static int[] createRandomArray(int n, Random r) { 
    int[] arr = new int[n];
    for (int i = 0; i < n; i++) arr[i] = r.nextInt();
    return arr;
}
public static void main(String... args) throws Exception {
    Random r = new Random(1);
    int k = 4;
    int repeats = 200;
    int n = 5000000;
    long[][] results = new long[3][repeats];
    for (int i = 0; i < repeats; i++) { 
        int[] arr = createRandomArray(n, r);
        int[] myCopy;
        myCopy = Arrays.copyOf(arr, n);
        long start = System.currentTimeMillis();
        findTopKNaive(myCopy, k);
        results[0][i] = System.currentTimeMillis() - start;
        myCopy = Arrays.copyOf(arr, n);
        start = System.currentTimeMillis();
        findTopKSort(myCopy, k);
        results[1][i] = System.currentTimeMillis() - start;
        myCopy = Arrays.copyOf(arr, n);
        start = System.currentTimeMillis();
        findTopKHeap(myCopy, k);
        results[2][i] = System.currentTimeMillis() - start;
    }
    long[] sums = new long[3];
    for (int i = 0; i < repeats; i++) 
        for (int j = 0; j < 3; j++)
        sums[j] += results[j][i];
    System.out.println(Arrays.toString(sums));

    System.out.println("results for statistic test:");
    for (int i = 0; i < repeats; i++) { 
        System.out.println(results[0][i] + " " + results[2][i]);
    }
}
Community
  • 1
  • 1
amit
  • 175,853
  • 27
  • 231
  • 333
  • I will check the this time efficiency. – Jon Kartago Lamida Jan 02 '13 at 13:06
  • @JonKartagoLamida: For time efficiency, use the alternative approaches suggested. min-heap offers `O(nlogK)` solutions, with pretty good constants as well. – amit Jan 02 '13 at 13:07
  • @JonKartagoLamida: See last edit. I added a small benchmark comparing 3 approaches and statistically tested it. According to the benchmark - the heap based solution is faster then sort and the naive approach! – amit Jan 02 '13 at 13:57
  • You aren't properly warming up. Why not use Google Caliper? Once you have it set up, it is a breeze to microbenchmark anything. – Marko Topolnik Jan 02 '13 at 14:05
  • @MarkoTopolnik: Since we are talking 200 iterations, and it is interleaved (naive,sort,heap,naive,sort,heap,...) I doubt warm up will be significant, though it should be done in the general case, agreed. I am not familiar with Google Caliper, I'll read into it later on, thanks for the advice. – amit Jan 02 '13 at 14:10
  • Warmup is needed for the JIT compiler to take notice of the code and compile it. With HotSpot's default settings, this happens only after 10,000 method calls (or total passes over a piece of code in a loop, I think). – Marko Topolnik Jan 02 '13 at 14:17
2

You should check out this answer by Peter Lawrey. Basically, the idea is to run through your array, adding each element to a SortedSet and maintaining the size at four by removing the least element in each iteration. This process is O(n), even in the worst case, compared with O(n logn) typical and O(n2) worst case for fully sorting an array.

final List<Integer> input = new ArrayList(Arrays.asList(1232, -1221, 0, 345, 78, 99));
final NavigableSet<Integer> topFour = new TreeSet<>();
for (int i : input) {
  topFour.add(i);
  if (topFour.size() > 4) topFour.remove(topFour.first());
}
System.out.println(topFour);
Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • 2
    Caution that this approach does not handle dupe elements in the input array. (not as a set anyway) – amit Jan 02 '13 at 13:27
  • True; for just four elements it could be replaced with a simple `ArrayList` with almost no performance loss. – Marko Topolnik Jan 02 '13 at 13:30
  • Or you can simply use a `PriorityQueue`, which supports duplicate elements, and is expected to be faster then a sorted set (though not significantly for k=4). If you are interested I just editted my answer and added a benchmark comparing 3 different approaches and found that a heap based solution is actually the fastest among them. – amit Jan 02 '13 at 14:00
  • Yes, I started out thinking in terms of `PriorityQueue`, but went away because it is undbounded, instead of just adding code that makes it bounded. – Marko Topolnik Jan 02 '13 at 14:04
1

The easiest way is to sort the array and take the first/last 4 elements.

In the end, the max 4 entries can be anywhere, so whatever you do, you need to read the whole array and it will be an O(n) operation.

assylias
  • 321,522
  • 82
  • 660
  • 783
1

The mentions before about sorting the array truly provide the easiest way, but not really the most efficient.

A variation on QuickSort (Quickselect), can be used to find the kth largest/smallest value in a collection.

http://en.wikipedia.org/wiki/Selection_algorithm

A correct implementation allows you to get the kth largest in O(n) time.

Basically you partition like in quicksort using a pivot, and compare the pivot position after each iteration with the position you want (four in your case), if it's equal, return the position, otherwise, apply the algorithm to the correct half of the input.

When you've found the index of the kth largest value, you can simply iterate over the array again and get the values inferior to the input[k].

This might be overkill for your case, since you need exactly four, but it's the most generic way of doing this.

If you don't care about memory too much, you can also use a Bounded PriorityQueue that keeps the top/bottom X values, and simply insert everything in the Queue. The ones that remain are the values you're interested in.

pcalcao
  • 15,789
  • 1
  • 44
  • 64
  • For now the main consideration is time efficiency only not storage efficiency. That's why I avoid sorting the whole array first because there will be the case when the array input will be so large. – Jon Kartago Lamida Jan 02 '13 at 13:24
1

Sort : sort the array and take the last four elements

Min Heap : The simplest solution for this is maintaining a min heap of max size 4.

This solution is O(nlogk) complexity, where n is the number of elements and k is the number of elements you need.

Priority Queue : you can create a PriorityQueue with a fixed size and a custom comparator as explained in this question with implementation.

Selection Algorithm : you can use selection algorithm, you can find the (n-k)th maximum element and then return all the elements which are higher than this element but it is harder to implement. Best case complexity : O(n)

Community
  • 1
  • 1
Rahul
  • 15,979
  • 4
  • 42
  • 63
-1
float a[] = {1.0f,3.0f,5.0f,6.0f,7.0f,10.0f,11.0f,3.2f,4.0f};

float first =0.0f;
float second=0.0f;
float third =0.0f;
for (int i=0; i<a.length; i++){
    if(first < a[i]){
        first=a[i];
    }
}
System.out.println("first largest is "+first);
for (int j=0; j<a.length; j++){
    if(a[j] <first && a[j] > second){
        second = a[j];
    }
}
System.out.println("second largest is "+second);
for (int k=0;k<a.length; k++){
    if(a[k]<second && a[k]>third){
        third =a[k];
    }
}
System.out.println("third largest is "+third);
  • the above code helps for any number of largest numbers with out sorting them – user3916211 Aug 06 '14 at 22:14
  • This really isn't any better than OP's initial attempt. And it will fail if the largest elements contan repeats, and also if any of them is negative. – vonbrand Aug 06 '14 at 23:01