Find Second largest number in array at most n+log₂(n)−2 comparisons

Question

You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most n+log₂(n)−2 comparisons.

http://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on — JotaBe, Mar 27 '12 at 12:47
I don't think that can be solution Reason being the number of comparisons allowed. Suppose n= 16 then I will end up doing 22 comparisons this way since I have to find second best then I will have to always store two numbers each stage and besides last stage it will be always two comparisons. — Amit Deshpande, Mar 27 '12 at 13:30
Can you please let me know how can I optimize Selection algorithm to minimize my comparisons for this case? — Amit Deshpande, Mar 27 '12 at 13:31

score 58 · Accepted Answer · edited Jun 14 '13 at 18:14

58

Start with comparing elements of the n element array in odd and even positions and determining largest element of each pair. This step requires n/2 comparisons. Now you've got only n/2 elements. Continue pairwise comparisons to get n/4, n/8, ... elements. Stop when the largest element is found. This step requires a total of n/2 + n/4 + n/8 + ... + 1 = n-1 comparisons.
During previous step, the largest element was immediately compared with log₂(n) other elements. You can determine the largest of these elements in log₂(n)-1 comparisons. That would be the second-largest number in the array.

Example: array of 8 numbers [10,9,5,4,11,100,120,110].

Comparisons on level 1: [10,9] ->10 [5,4]-> 5, [11,100]->100 , [120,110]-->120.

Comparisons on level 2: [10,5] ->10 [100,120]->120.

Comparisons on level 3: [10,120]->120.

Maximum is 120. It was immediately compared with: 10 (on level 3), 100 (on level 2), 110 (on level 1).

Step 2 should find the maximum of 10, 100, and 110. Which is 110. That's the second largest element.

edited Jun 14 '13 at 18:14

CodesInChaos

106,488
23
218
262

answered Mar 27 '12 at 13:34

Evgeny Kluev

24,287
7
55
98

Can you please elaborate more. Consider an example I have array of 8 numbers 10,9,5,4,11,100,120,110 Now if I do pair wise comparisons it will be [10,9] ->10 [5,4]-> 5, [11,100]->100 , [120,110]-->120 now 110 is lost since not it will not come in to any comparison. – Amit Deshpande Mar 27 '12 at 13:42
2

@AmitDeshpande, 110 is not lost. In fact, it is compared with the maximum value (120). So it should be included to the set of elements to be compared with each other on step 2 of the algorithm. – Evgeny Kluev Mar 27 '12 at 13:53
sorry but it is still not clear with me it will add one more comparison on each level which leads to number of comparisons to n+n/2-2 not the expected result. Can you please give me example so that I could understand it better. – Amit Deshpande Mar 27 '12 at 14:05
@EvgenyKluev would you implement step 2 with a HashMap that maps every biggest element to the elements it has been compared to? – codd Dec 02 '12 at 17:20
2

@codd: no, I would not use HashMap because HashMap may use some additional comparisons, which does not satisfy OP requirements. I would use a binary tree, implemented as arrays of sizes N, N/2, N/4, ..., containing original array's indexes. – Evgeny Kluev Dec 02 '12 at 17:49
@EvgenyKluev Could you elaborate on how you build the tree and select all elements compared with the largest one in constant time? – codd Dec 05 '12 at 00:37
2

@codd: just fill these additional arrays with indexes of smaller element after each comparison. Example from this answer gives [1,3,4,7], [2,5], [0]. After finding the maximum, take its index (6), divide it by 2 several times, which gives indexes [3,1,0]. Then use these indexes to get original array's indexes from these additional arrays: [7,5,0], which means we should compare values 110, 100, and 10. – Evgeny Kluev Dec 05 '12 at 10:18
@EvgenyKluev: How's your approach for step 2 is better than hashtable? – user Jul 10 '13 at 17:58
@user: as I replied earlier, hashtable uses additional element comparisons to check for collisions and possibly to resolve collisions. This violates OP requirements. If step 2 is implemented with binary tree, no additional comparisons are needed. – Evgeny Kluev Jul 10 '13 at 18:55
@EvgenyKluev Which algorithm is better, the one you said or median of medians algorithm? Are both n+logn ? – fredcrs Jul 11 '16 at 17:09
1

@fredcrs: Median of medians has got a lot of improvements recently (see last part of this video: https://vimeo.com/171927600), still it uses much more comparisons than algorithm presented here: instead of n+logn it needs k*n comparisons where k is some constant, dependent on implementation, this constant is always greater than 1. – Evgeny Kluev Jul 11 '16 at 19:40

Abhijit Sarkar · Answer 2 · 2021-09-27T00:49:34.427

sly s's answer is derived from this paper, but he didn't explain the algorithm, which means someone stumbling across this question has to read the whole paper, and his code isn't very sleek as well. I'll give the crux of the algorithm from the aforementioned paper, complete with complexity analysis, and also provide a Scala implementation, just because that's the language I chose while working on these problems.

Basically, we do two passes:

Find the max, and keep track of which elements the max was compared to.
Find the max among the elements the max was compared to; the result is the second largest element.

In the picture above, 12 is the largest number in the array, and was compared to 3, 1, 11, and 10 in the first pass. In the second pass, we find the largest among {3, 1, 11, 10}, which is 11, which is the second largest number in the original array.

Time Complexity:

All elements must be looked at, therefore, n - 1 comparisons for pass 1.
Since we divide the problem into two halves each time, there are at most log₂n recursive calls, for each of which, the comparisons sequence grows by at most one; the size of the comparisons sequence is thus at most log₂n, therefore, log₂n - 1 comparisons for pass 2.
Total number of comparisons <= (n - 1) + (log₂n - 1) = n + log₂n - 2

def second_largest(nums: Sequence[int]) -> int:
    def _max(lo: int, hi: int, seq: Sequence[int]) -> Tuple[int, MutableSequence[int]]:
            if lo >= hi:
                return seq[lo], []
            mid = lo + (hi - lo) // 2
            x, a = _max(lo, mid, seq)
            y, b = _max(mid + 1, hi, seq)

            if x > y:
                a.append(y)
                return x, a
            b.append(x)
            return y, b

    comparisons = _max(0, len(nums) - 1, nums)[1]
    return _max(0, len(comparisons) - 1, comparisons)[0]

The first run for the given example is as follows:

lo=0, hi=1, mid=0, x=10, a=[], y=4, b=[]
lo=0, hi=2, mid=1, x=10, a=[4], y=5, b=[]
lo=3, hi=4, mid=3, x=8, a=[], y=7, b=[]
lo=3, hi=5, mid=4, x=8, a=[7], y=2, b=[]
lo=0, hi=5, mid=2, x=10, a=[4, 5], y=8, b=[7, 2]
lo=6, hi=7, mid=6, x=12, a=[], y=3, b=[]
lo=6, hi=8, mid=7, x=12, a=[3], y=1, b=[]
lo=9, hi=10, mid=9, x=6, a=[], y=9, b=[]
lo=9, hi=11, mid=10, x=9, a=[6], y=11, b=[]
lo=6, hi=11, mid=8, x=12, a=[3, 1], y=11, b=[9]
lo=0, hi=11, mid=5, x=10, a=[4, 5, 8], y=12, b=[3, 1, 11]

Things to note:

There are exactly n - 1=11 comparisons for n=12.
From the last line, y=12 wins over x=10, and the next pass starts with the sequence [3, 1, 11, 10], which has log₂(12)=3.58 ~ 4 elements, and will require 3 comparisons to find the maximum.

its not clear, that from the first pass we do n-1 comparations. On the lowest level we do n/2 comparations, then n/4 and so on. — nutella_eater, Sep 26 '21 at 19:57
@nutella_eater See edit, HTH. If you still don't understand, run the algorithm on various arrays and step through the code with a debugger or by printing to stdout. — Abhijit Sarkar, Sep 27 '21 at 00:47
yeah, I know, running algo will give you n-1, but how to prove it for all n? — nutella_eater, Sep 27 '21 at 07:07
@nutella_eater This answer explains the gist of the algorithm very simply, it doesn’t attempt to formally prove the algorithm. You can prove it by induction. Or read the paper I linked to. If you can’t do it yourself, try opening a new question without piggybacking on this answer. — Abhijit Sarkar, Sep 27 '21 at 09:30

score 4 · Answer 3 · edited Sep 27 '21 at 20:17

I have implemented this algorithm in Java answered by @Evgeny Kluev. The total comparisons are n+log2(n)−2. There is also a good reference: Alexander Dekhtyar: CSC 349: Design and Analyis of Algorithms. This is similar to the top voted algorithm.

public class op1 {
    
    private static int findSecondRecursive(int n, int[] A){
        int[] firstCompared = findMaxTournament(0, n-1, A); //n-1 comparisons;
        int[] secondCompared = findMaxTournament(2, firstCompared[0]-1, firstCompared); //log2(n)-1 comparisons.
        //Total comparisons: n+log2(n)-2;
        return secondCompared[1];
    }
    
    private static int[] findMaxTournament(int low, int high, int[] A){
        if(low == high){
            int[] compared = new int[2];
            compared[0] = 2;
            compared[1] = A[low];
            return compared;
        }
        int[] compared1 = findMaxTournament(low, (low+high)/2, A);
        int[] compared2 = findMaxTournament((low+high)/2+1, high, A);
        if(compared1[1] > compared2[1]){
            int k = compared1[0] + 1;
            int[] newcompared1 = new int[k];
            System.arraycopy(compared1, 0, newcompared1, 0, compared1[0]);
            newcompared1[0] = k;
            newcompared1[k-1] = compared2[1];
            return newcompared1;
        }
        int k = compared2[0] + 1;
        int[] newcompared2 = new int[k];
        System.arraycopy(compared2, 0, newcompared2, 0, compared2[0]);
        newcompared2[0] = k;
        newcompared2[k-1] = compared1[1];
        return newcompared2;
    }
    
    private static void printarray(int[] a){
        for(int i:a){
            System.out.print(i + " ");
        }
        System.out.println();
    }
    
    public static void main(String[] args) {
        //Demo.
        System.out.println("Origial array: ");
        int[] A = {10,4,5,8,7,2,12,3,1,6,9,11};
        printarray(A);
        int secondMax = findSecondRecursive(A.length,A);
        Arrays.sort(A);
        System.out.println("Sorted array(for check use): ");
        printarray(A);
        System.out.println("Second largest number in A: " + secondMax);
    }
}

At the end of this paper, they make a bizarre claim, `We state the following theorem without proof. Theorem. Finding the largest number in an array of N numbers requires at least N + ⌈log2(N)⌉ − 2 comparisons.` How can that be true? Finding the largest element in an array is a linear scan requiring N-1 comparisons. — Tommy, Jul 12 '19 at 14:29
(@Tommy: Given the paragraph title reads *Lower Bound for Finding Second Largest Number*, that assertion/theorem simply lacks a *2nd*.) — greybeard, Sep 27 '21 at 20:20

score 1 · Answer 4 · answered Feb 06 '13 at 02:31

the problem is: let's say, in comparison level 1, the algorithm need to be remember all the array element because largest is not yet known, then, second, finally, third. by keep tracking these element via assignment will invoke additional value assignment and later when the largest is known, you need also consider the tracking back. As the result, it will not be significantly faster than simple 2N-2 Comparison algorithm. Moreover, because the code is more complicated, you need also think about potential debugging time.
eg: in PHP, RUNNING time for comparison vs value assignment roughly is :Comparison: (11-19) to value assignment: 16.

score 0 · Answer 5 · answered May 15 '14 at 04:56

I shall give some examples for better understanding. :
example 1 :
>12 56 98 12 76 34 97 23
>>(12 56) (98 12) (76 34) (97 23)
>>> 56 98 76 97 
>>>> (56 98) (76 97)
>>>>> 98 97
>>>>>> 98   

The largest element is 98

Now compare with lost ones of the largest element 98. 97 will be the second largest.

score 0 · Answer 6 · edited Jun 26 '14 at 12:12

nlogn implementation

      public class Test {
           public static void main(String...args){
              int arr[] = new int[]{1,2,2,3,3,4,9,5, 100 , 101, 1, 2, 1000, 102, 2,2,2};
              System.out.println(getMax(arr, 0, 16));
           }

        public static Holder getMax(int[] arr, int start, int end){
           if (start == end)
             return new Holder(arr[start], Integer.MIN_VALUE);
        else {
          int mid = ( start + end ) / 2;
          Holder l = getMax(arr, start, mid);
          Holder r = getMax(arr, mid + 1, end);

          if (l.compareTo(r) > 0 )
            return new Holder(l.high(), r.high() > l.low() ? r.high() : l.low());
          else
            return new Holder(r.high(), l.high() > r.low() ? l.high(): r.low());
        }
      }

    static class Holder implements Comparable<Holder> {
      private int low, high;
      public Holder(int r, int l){low = l; high = r;}

    public String toString(){
      return String.format("Max: %d, SecMax: %d", high, low);
    }

    public int compareTo(Holder data){
      if (high == data.high)
        return 0;

      if (high > data.high)
        return 1;
      else
        return -1;
    }

    public int high(){
      return high;
    }
    public int low(){
      return low;
    }
  }

}

if you have an n log n implementation, might as well use merge sort and take merged[-2]. That would be n log n — Tommy, Jul 12 '19 at 14:26

score 0 · Answer 7 · answered Oct 21 '16 at 18:01

0

Why not to use this hashing algorithm for given array[n]? It runs c*n, where c is constant time for check and hash. And it does n comparisons.

    int first = 0;
    int second = 0;
    for(int i = 0; i < n; i++) {
        if(array[i] > first) {
            second = first;
            first = array[i];
        }
    }

Or am I just do not understand the question...

answered Oct 21 '16 at 18:01

DeeDee

1
1
1

This approach needs more comparisons to work correctly. In your example `second` be equal to `0` for array `[3,2]` – Nikita Kukushkin Jul 23 '21 at 13:25

score 0 · Answer 8 · answered Nov 07 '16 at 18:24

In Python2.7: The following code works at O(nlog log n) for the extra sort. Any optimizations?

def secondLargest(testList):
    secondList = []
    # Iterate through the list
    while(len(testList) > 1):
        left = testList[0::2]
        right = testList[1::2]

        if (len(testList) % 2 == 1):
            right.append(0)

        myzip = zip(left,right)
        mymax = [ max(list(val)) for val in myzip ]
        myzip.sort()
        secondMax = [x for x in myzip[-1] if x != max(mymax)][0]

        if (secondMax != 0 ):
            secondList.append(secondMax)
        testList = mymax

    return max(secondList)

SudiptaDas · Answer 9 · 2015-11-23T14:30:56.580

public static int FindSecondLargest(int[] input)
        {
            Dictionary<int, List<int>> dictWinnerLoser = new Dictionary<int, List<int>>();//Keeps track of loosers with winners
            List<int> lstWinners = null;
            List<int> lstLoosers = null;

            int winner = 0;
            int looser = 0;

            while (input.Count() > 1)//Runs till we get max in the array
            {
                lstWinners = new List<int>();//Keeps track of winners of each run, as we have to run with winners of each run till we get one winner

                for (int i = 0; i < input.Count() - 1; i += 2)
                {
                    if (input[i] > input[i + 1])
                    {
                        winner = input[i];
                        looser = input[i + 1];
                    }
                    else
                    {
                        winner = input[i + 1];
                        looser = input[i];
                    }

                    lstWinners.Add(winner);


                    if (!dictWinnerLoser.ContainsKey(winner))
                    {
                        lstLoosers = new List<int>();
                        lstLoosers.Add(looser);
                        dictWinnerLoser.Add(winner, lstLoosers);
                    }
                    else
                    {
                        lstLoosers = dictWinnerLoser[winner];
                        lstLoosers.Add(looser);
                        dictWinnerLoser[winner] = lstLoosers;
                    }
                }

                input = lstWinners.ToArray();//run the loop again with winners
            }

            List<int> loosersOfWinner = dictWinnerLoser[input[0]];//Gives all the elemetns who lost to max element of array, input array now has only one element which is actually the max of the array

            winner = 0;

            for (int i = 0; i < loosersOfWinner.Count(); i++)//Now max in the lossers of winner will give second largest
            {
                if (winner < loosersOfWinner[i])
                {
                    winner = loosersOfWinner[i];
                }
            }

            return winner;
        }

I didn't downvote, but a bit of commentary to go along with the code may help you here. — LDMJoe, Nov 19 '15 at 20:17
Only down voting does not help, my code meets all the requirement of the problem statement and is of the desired time complexity. Please give the reasons of your down vote. — SudiptaDas, Nov 23 '15 at 14:33

Find Second largest number in array at most n+log₂(n)−2 comparisons

9 Answers9

Linked