The Frederickson and Johnson algorithm works roughly as follows. (It always reminds me of Quickselect.) This is from an implementation that used a different implementation as a template, so the details might be a little different than in their papers, but it is the same time complexity and the same idea. I'm assuming the legal values for k
run from 0
to M * N - 1
.
We will access the element in the i
th row and j
th column as A[i, j]
(counting from 0). In addition, we will pretend we can access A[i, -1]
which is negative infinity and A[i, N]
which is positive infinity. We maintain two arrays of indices left[0 .. M)
and right[0..M)
, and two variables lessthanleft
and greaterthanright
, with the following properties:
- There exists a matrix element
A[i0, j0]
so that for all rows i
, we have A[i, left[i] - 1] < A[i0, j0] <= A[i, left[i]]
. That is, left[i]
is roughly the position where A[i0, j0]
would be if it were in row i
.
- The sum of the values
left[i]
, for i
from 0
to M - 1
, is lessthanleft
. Note that lessthanleft
is the number of matrix elements in the positions to the left of the position indicated by left
-- that is, matrix elements that are strictly less than A[i0, j0]
.
lessthanleft <= k
. So this says that the element we're looking for is in some row i
to the right of left[i]
.
- There exists a matrix element
A[i1, j1]
so that for all rows i
, we have A[i, right[i] - 1] < A[i1, j1] <= A[i, right[i]]
. That is, right[i]
is roughly the position where A[i1, j1]
would be if it were in row i
.
- The sum of the values
N - right[i]
, for i
from 0
to M - 1
, is greaterthanright
. Note that greaterthanright
is the number of matrix elements in the positions to the right of the position indicated by right
-- that is, matrix elements that are strictly greater than A[i1, j1]
.
greaterthanright <= M * N - k
. So this says that the element we're looking for is in some row i
to the left of right[i]
.
These properties can be initialized by setting every element of left
to 0
, every element of right
to N
, and both lessthanleft
and greaterthanright
to 0
-- unless k = 0
(in which case we return A[0, 0]
) or k = M*N - 1
(in which case we return A[M-1, N-1]
). This corresponds, by the way, to i0=0
, j0=0
, i1=M-1
, j1=N-1
.
Now in every iteration, we pick a pivot matrix element A[i2, j2]
with left[i2] <= j2 < right[i2]
. (There are several strategies; I'll talk about this below.) We use arrays less[0..M)
and lessequal[0..M)
which we will fill in the following inner loop, and set variables nless
and nequal
and ngreater
to 0
. In the inner loop we iterate over all rows i
, and set less[i]
and lessequal[i]
so that A[i, less[i] - 1] < A[i2, j2] <= A[i, less[i]]
and A[i, lessequal[i] - 1] <= A[i2, j2] < A[i, lessequal[i]]
. (Since less[i-1] >= less[i]
, you can use the values of less[i-1]
and lessequal[i-1]
and start linear search to the left from there to make this take O(N)
time overall.) After each such step we add less[i] - left[i]
to nless
, we add lessequal[i] - less[i]
to nequal
, and we add right[i] - lessequal[i]
to ngreater
.
After this inner loop, we check to see if lessthanleft + nless >= k
. If this is the case, we continue with the entries less than A[i2, j2]
by setting right
to be less
(by pointer flipping if you want to prevent allocation in the loop for the arrays, or by copying values from less
to right
) and greaterthanright
to be greaterthanright + ngreater + nequal
, and continuing with the next iteration. If lessthanleft + nless + nequal < k
, then we continue with the entries greater than A[i2, j2]
by setting left
to be lessequal
and lessthanleft
to be lessthanleft + nless + nequal
, and continuing with the next iteration. Otherwise, the entry we're looking for is among the entries equal to A[i2, j2]
, so we return A[i2, j2]
.
Now about picking the pivot. One way would be to pick a random number c
in the interval [0 .. N * M - lessthanleft - greaterthanright)
; we then find the row containing the c
th matrix element between left
and right
, by subtracting left[i] - right[i]
from c
for each i
starting from 0
until it becomes negative. Now select the i
where it becomes negative and let j = round((left[i] + right[i])/2)
. Alternatively you could do a median-of-medians style computation on the medians of the remaining entries in each row.
So generally, the set of matrix elements that is in between left[i]
and right[i]
on each row gets split, hopefully roughly in half on every iteration. The complexity analysis is similar to the Quickselect one, where the expected number of iterations is "almost certainly" (in the mathematical sense) logarithmic in the number of values in the initial pool. By using a median-of-medians style pivot choice, I believe you could achieve this with certainty while paying for it in practical runtime, but I'll assume for now that "almost certainly" is good enough. (This is the same condition under which Quicksort is O(N lg(N)).) Any individual iteration takes O(M)
time to pick a pivot, then O(N)
time in the inner loop. The initial pool of candidates has M*N
members, so the number of iterations is almost certainly O(lg(M * N)) = O(lg(M) + lg(M))
, for a total complexity of O(N * (lg(M) + lg(N)))
. The algorithm that uses a heap with an entry for every row takes O(k * lg(M))
, which is much worse since k
is only bounded by N*M
.
Here's a simple example: consider M=4
, N=5
, and we need to pick the k=11
th element out of the matrix A
given as follows:
6 12 17 17 25
9 15 17 19 30
16 17 23 29 32
23 29 35 35 39
We initialize left
to [0,0,0,0]
, right
to [5,5,5,5]
, lessthanleft
and greaterthanright
both to 0
.
Suppose for the first iteration we pick A[1, 2]=17
as the pivot. During the inner loop we start examining the first row from entry right[0]-1=4
to the left, finding the first occurrence of a number that is at most 17
. This is in column 3
, so we set lessequal[0]=3+1=4
. We now continue searching for the first occurrence of a number that is strictly less than 17
; this occurs in column 1
so we set less[0]=1+1=2
. For the next row, we can start looking for a value that is at most 17
at column lessequal[0]
, and we find it in column 2
so set lessequal[1]=2+1=3
. Looking for a value that is strictly less than 17
starting from less[0]
, we set less[1] = 2
. Continuing we get less = [2,2,1,0]
and lessequal = [4,3,2,0]
. Hence nless = 5
, nequal = 4
, and ngreater = 11
. We have lessthanleft + nless + nequal = 9 < 11
, so we continue with the entries greater than 17
and set left = lessequal = [4,3,2,0]
and lessthanleft = 9
.
For the next iteration, we need to pick a pivot on some row i
in the centre between left[i]
and right[i]
. Maybe we pick row 2
, which means we have A[2,3] = 29
. During the inner loop, we now get less = [5,4,3,1]
and lessequal = [5,4,4,2]
with nless = 4
and nequal = 2
and ngreater = 5
. Now lessthanleft + nless = 13 > 11
, so we continue with the entries less than 29
and set right = less = [5,4,3,1]
and greaterthanright = 7
.
For the third iteration, every row now has only one entry left. We pick a row at random - maybe row 3
. The pivot is A[3,0] = 23
. During the inner loop, we get less = [4,4,2,0]
and lessequal = [4,4,3,1]
. Hence nless = 1
, nequal = 2
, and ngreater = 1
. We now have lessthanleft + nless = 10 < k = 11 <= lessthanleft + nless + nequal = 12
, so we return 23
. Indeed, there are 10 matrix entries strictly less than 23
(those to the left of A[i, less[i]]
in the last iteration) and 12 matrix entries that are at most 23
(those to the left of A[i, lessequal[i]]
in the last iteration).