Finding Nth largest element in huge 2D array

Question

I'm trying to find the N-th largest element in a large 2-D array( approximately 850,000 elements). The method I'm using now is converting it into a 1-D array and then using the selection sort algorithm and finding it that way but it takes way too long.

Does anybody know a good way to find the N-th largest element while just looking through the matrix without sorting it, like finding the largest element.

Keep a list of the N largest values. Initialise that list with the first N elements. Then walk the remainder of the list and if you find one bigger than you have so far, put it in the list, and push the lowest one out. — David Heffernan, Jan 30 '14 at 17:49
Nice approach @DavidHeffernan! I was just writing pseudocode for the same thing. — BrettFromLA, Jan 30 '14 at 17:56

CS Pei · Answer 1 · 2014-01-30T17:56:34.593

3

I think this is some kind of homework or interview question, so I would just lay out the steps for you.

find an appropriate structure to store N nodes. Say X. (hint, fast search , insert, delete...)
travel through the matrix in one pass, save the value larger than those in your X, and save it to X, and delete the minimum in X.
at the end, the minimum of X is the N-largest.

The the extra space is O(N), time is O(size of the matrix).

edited Jan 30 '14 at 17:56

answered Jan 30 '14 at 17:48

CS Pei

10,869
1
27
46

1

Terrific for finding the **largest** element. Not so much for finding the **N-th largest** (or smallest, etc.) – WhozCraig Jan 30 '14 at 17:49
@WhozCraig the N-th largest is the minimum of John Smith's X structure. – DSquare Jan 30 '14 at 17:50
@DSquare Ah.. ok. That makes more sense. – WhozCraig Jan 30 '14 at 17:51
yes, I should make that clear. – CS Pei Jan 30 '14 at 17:52
@JohnSmith its pretty clear now, its the `N-nodes` clause in your step-1. Sry about my confusion (been a decaf morning). The difficulty is the efficiency of managing that list, and pushing nodes out the bottom as new winners are nominated at the top. – WhozCraig Jan 30 '14 at 17:54

score 1 · Answer 2 · answered Jan 30 '14 at 18:01

1

I think randomized quickselect should be fine

https://en.wikipedia.org/wiki/Quickselect

average case O(n)

answered Jan 30 '14 at 18:01

chrk

4,037
2
39
47

score 0 · Answer 3 · edited May 23 '17 at 10:25

0

Unless you change your data structure to a better one (either a sorted array, or something else), it will be quite hard for you to not iterate through all elements of the array.

From this stackoverflow question (Print largest number in a 2d array - why do my code print three numbers) i got this code and move it to C language:

#include<limits.h>

(...)

int twodArray[XXX][YYY];
int maxValue = INT_MIN;
int yForMaxValue, xForMaxValue;
int i = 0;
    for (i = 0; i < XXX; i++) {
        for (int j = 0; j < YYY; j++)
            if (twodArray[i][j] > maxValue) {
                maxValue = twodArray[i][j];
                yForMaxValue = j;
                xForMaxValue = i;
            }
        }
    }

yForMaxValue and yForMaxValue are the index of the position to the maxValue.

edited May 23 '17 at 10:25

Community

1
1

answered Jan 30 '14 at 17:52

Lauro Wolff Valente Sobrinho

841
7
17

This finds the largest, not the N-th largest, right? – CS Pei Jan 30 '14 at 17:54
Yep, correct. If you want the N-th largest, you will also need to store the i and j values when the if is validated. I will edit the answer. – Lauro Wolff Valente Sobrinho Jan 30 '14 at 17:55
I am not convinced yet. – CS Pei Jan 30 '14 at 17:57
Why arent you convinced? :( – Lauro Wolff Valente Sobrinho Jan 30 '14 at 18:00
1

I'm not convinced either. Because when this is finished you've isolated *an* index (in this case an "index" is an x,y-pair); not *N* indexes. Ultimately you need to keep those somewhere, maintained as the list is scanned, and updated accordingly. That is the step that makes the algorithm considerably more complicated than you have here. For example, suppose the `maxValue` is the *first* value at (0,0). With this algorithm the if-condition will only ever be satisfied **once**, and your "store" of pairs that succeed in the if-clause will be *one* element, not N. – WhozCraig Jan 30 '14 at 18:07

chux - Reinstate Monica · Answer 4 · 2014-01-31T16:29:14.223

A priority queue with a limited size of N should do the job.

Put each element of the 2D array in the queue.
O(Arraysize * log2(N)) complexity.
Then the first element in the queue is the Nth smallest (assuming arry size >= N)

#include <stdio.h>
#include <stdlib.h>
typedef int PriorityQueue_T;

typedef struct {
  PriorityQueue_T *A;
  size_t Length;
  size_t Size;
} PriorityQueue;

void PriorityQueue_Remove(PriorityQueue *Q) {
  if (Q->Length > 0) {
    size_t Index = 0;
    while (1) {
      size_t Mother = Index * 2 + 1;
      size_t Father = Mother + 1;
      if (Mother >= Q->Length) break;
      if ((Q->A[Mother] < Q->A[Father]) || (Father >= Q->Length)) {
        Q->A[Index] = Q->A[Mother];
        Index = Mother;
      } else {
        Q->A[Index] = Q->A[Father];
        Index = Father;
      }
    }
    Q->Length--;
    Q->A[Index] = Q->A[Q->Length];
    while (Index > 0) {
      size_t Child = (Index + 1)/ 2 - 1;
      if (Q->A[Index] >= Q->A[Child]) {
        break;
      }
      PriorityQueue_T tmp = Q->A[Index];
      Q->A[Index] = Q->A[Child];
      Q->A[Child] = tmp;
      Index = Child;
    }
  }
}

void PriorityQueue_Insert(PriorityQueue *Q, PriorityQueue_T x) {
   if (Q->Length < Q->Size || x > Q->A[0]) {
    if (Q->Length >= Q->Size) {
      PriorityQueue_Remove(Q);
    }
    size_t Index = Q->Length++;
    Q->A[Index] = x;
    while (Index > 0) {
      size_t Child = (Index + 1)/ 2 - 1;
      if (Q->A[Index] >= Q->A[Child]) {
        break;
      }
      PriorityQueue_T tmp = Q->A[Index];
      Q->A[Index] = Q->A[Child];
      Q->A[Child] = tmp;
      Index = Child;
    }
  }
}

// Pseudo code here to end
void PQ_FindNthSmallest(PriorityQueue_T Array[W][H], size_t N) {
  PriorityQueue Q;
  Q.Length = 0;
  Q.Size = N;
  Q.A = malloc(Q.Size * sizeof(PriorityQueue_T));

  For each element in the array
    PriorityQueue_Insert(&Q, Array[x][y]);

  printf("Nth smallest: %CorrespondingFormat", Q.A[0]);
  free(Q.A);
}

Finding Nth largest element in huge 2D array

4 Answers4