40

Say I have a matrix (MxN) which has its rows and columns sorted.

  1. All the elements in each row are arranged in increasing order
  2. All the elements in each column are arranged in increasing order
  3. All the elements are integers
  4. No other assumptions can be made

    Example:

    [1 5 8 20]

    [2 9 19 21]

    [12 15 25 30]

I have to find if a given number is present in the matrix or not (Basic search). I have an algorithm which runs O(n)

int row = 0;
int col = N-1;

while (row < M && col >= 0) {
  if (mat[row][col] == elem) { 
    return true;
  } else if (mat[row][col] > elem) { 
    col--;
  } else { 
    row++;
  } 
}

But I was asked an O(log (MxN)) == O(Log(n)) solution. Any ideas??

noMAD
  • 7,744
  • 19
  • 56
  • 94
  • what do you know about the matrix going in, other than it being sorted (like its row/col size, perhpas?) – Jordan May 14 '12 at 19:43
  • @Yoel: Well, its can be huge, only integers, can have negative numbers. Anything specific you looking for? – noMAD May 14 '12 at 19:45
  • Is the first element of each row guaranteed to be larger than the last element in the previous row? (As in your example) In this case the problem is trivial with a modified binary search – BrokenGlass May 14 '12 at 19:50
  • @BrokenGlass: Nope, that's not guaranteed at all. The only condition is the rows are sorted in ascending order and so are the columns. – noMAD May 14 '12 at 19:53
  • @phix23: I am not sure about that. Didn't think of it actually. But if `n = MxN` how would you do the search by considering it to be an array? – noMAD May 14 '12 at 19:54
  • Could you clarify what the invariant property of the matrices is? How are they "sorted"? Is A_(i)_(j) always going to be no less than A_(i-1)_(j) and A_(i)_(j-1)? Are there any other conditions? – Kaganar May 14 '12 at 21:34
  • @Kaganar: I have edited the question for you. – noMAD May 14 '12 at 22:26

5 Answers5

76

O(log (M * N)) solution is not possible for this task.

Let's look at a simplified task: in "sorted" square matrix assume all elements above secondary diagonal (green) less than given number, all elements below secondary diagonal (red) greater than given number, and no additional assumptions for elements on secondary diagonal (yellow).

enter image description here

Neither original assumptions of this task, nor these additional assumptions tell us how elements on secondary diagonal are related to each other. Which means we just have an unsorted array of N integers. We cannot find given number in the unsorted array faster than O(N). So for original (more complicated) problem with square matrix we cannot get a solution better than O(N).

For a rectangular matrix, stretch the square picture and set the additional assumptions accordingly. Here we have min(N,M) sorted sub-arrays of size max(N,M)/min(N,M) each. The best way to search here is to use linear search to find one or several sub-arrays that may contain given value, then to use binary search inside these sub-arrays. In the worst case it is necessary to binary-search in each sub-array. Complexity is O(min(N,M) * (1 + log(max(N,M) / min(N,M)))). So for original (more complicated) problem with rectangular matrix we cannot get a solution better than O(min(N,M) * ( 1 + log(max(N,M)) - log(min(N,M)))).

Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • 11
    surprised no-one's posted it here yet but [this](http://twistedoakstudios.com/blog/Post5365_searching-a-sorted-matrix-faster) confirms your bound on the complexity and outlines an algorithm that performs in that time. – Mike H-R Aug 13 '13 at 14:27
  • 2
    Plus he gets a [Competence Point](http://twistedoakstudios.com/blog/Post5365_searching-a-sorted-matrix-faster) – recursion.ninja Dec 01 '13 at 20:56
  • 2
    Since we cannot solve the `M==N` case in zero time, the complexity should be given as `O(min(N,M) * (1 + log(max(N,M)) - log(min(N,M))))`. – hardmath Jun 22 '14 at 12:49
7

It is not possible to do better than O(n). Some guys (there are at least three of them on this page) think they can do better but that's because their algorithms are wrong or because they don't know how to compute the complexity of their algorithm so they try to guess it. This blog post is very good and will explain you the errors of these guys.

Draft of a proof that O(n) is optimal: consider the following matrix:

1     2     3     4     5     6 … (n-2)  (n-1) (n+1)
2     3     4     5     6     7 … (n-1)  (n+1) (n+2)
3     4     5     6     7     8 … (n+1)  (n+2) (n+3)
…     …     …     …     …     … … …      …     …
(n-2) (n-1) …     …     …     … … …      …     (2n-1)
(n-1) (n+1) …     …     …     … … …      …     2n
(n+1) (n+2) …     …     …     … … (2n-1) 2n    (2n+1)

If you are looking for n in this matrix you must check at least once for each row if n is in the row because n could be in any row. (The proof is not complete but here is the idea)

Thomash
  • 6,339
  • 1
  • 30
  • 50
  • 1
    +1 I would have accepted this answer if the proof completely convinced me. But thanks for that lovely link. :-) – noMAD May 14 '12 at 23:09
  • @Thomash: Though it is almost 1 year. But I am just curious about the algorithm complexity computed in the above blog post for "Quad Partition" method by the author. As we are dividing the matrix in "4" submatrix and discarding one of these four. So shouldn't be T(n) = 3*T(n/4)+c instead of T(n) = 3*T(n/2)+c(this computed by author) – Manish Jul 27 '13 at 23:24
  • 1
    @Manish the `n` in `T(n)` is the number of rows (or columns) in the matrix, not the number of cells (which is `n^2`). The 3 smaller matrices have `n/2` rows, which means that they have `n^2/4` cells. – Thomash Jul 27 '13 at 23:42
  • @Thomash Yeah, got it. I just read the comments below the post. Thank you for the link.. – Manish Jul 27 '13 at 23:55
  • 1
    @Thomash the link is broken, can you please redirect? – aerin Apr 06 '17 at 23:16
  • @Thomash for your answer, you are referring to a link that is not working anymore, do you mind editing your answer or find a similar link? – Mahshid Zeinaly Jun 26 '17 at 19:54
  • I've found http://sq8alg.blogspot.fr/2012/12/searching-2d-sorted-matrix-part.html that has the same article. – Thomash Jun 29 '17 at 09:04
4

You have to use recursion to solve this problem. Given a matrix X and number y, you can do binary search for y on the middle row of X and divide the matrix into four parts such that:

A|B
---
C|D

all elements in A are less than y, all elements in D are greater than y, and y can be in B and C. Iteratively find y in B and C.

Since height(A)=height(B)\approx= height(C)=height(D), size(X)>= 2*(size(B)+size(C)) . So the resulting complexity if O(logn).

def find(X,y):
    a,b = X.shape
    i = a /2
    j = binsearch(X[i,:], y)
    if X[i,j]==y:
        return True
    else:
        return find( X[ (i+1):a, 0:(j-1)], y ) or find( X[ 0:i, j:b], y )
ElKamina
  • 7,747
  • 28
  • 43
  • 2
    A fully illustrated solution is also presented here: http://www.leetcode.com/2010/10/searching-2d-sorted-matrix-part-ii.html – BrokenGlass May 14 '12 at 22:00
  • "Since height(A)=height(B)\approx= height(C)=height(D), size(X)>= 2*(size(B)+size(C)) . So the resulting complexity if O(logn)." -> No – Thomash May 14 '12 at 22:25
  • 4
    You have T(n)=2*T(n/2)+O(1), I am sorry but O(Log(n)) is not a solution for this equation. – Thomash May 14 '12 at 22:26
  • @ElKamina: This method is a good one but as Thomash pointed it out, the calculation of your run time is flawed. – noMAD May 14 '12 at 23:06
  • @Thomash It is *not* T(n)=2*T(n/2)+O(1), but T(n)=2*T(n/**4**)+O(1). SO it is O(logn) – ElKamina May 15 '12 at 00:02
  • @ElKamina You define `n` as number of _elements_ in the matrix. OP defines `n` as number of _rows/columns_ in the matrix. That's why the argument is actually `n/2`, not `n/4`. – Skiminok May 15 '12 at 00:41
  • 2
    @ElKamina anyway O(Log(n)) is not a solution of T(n)=2*T(n/4)+O(1). – Thomash May 15 '12 at 05:50
2

Since both rows and columns are sorted, if we look at the first element of each row we can find which one contains the number we're looking for. Then, again, we can exploit the fact that the elements in each row are sorted and find that number.
The fastest search algorithm I know is Binary Search, which has a complexity of O(log n), so the total complexity will be O(log m + log n).
Here's an example, suppose we're looking for 28:

 1  2  3  4  5  6  7  8  9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
  • We do a binary search over the elements of the first column (1, 11, 21, 31, 41) and find that the row is the third, because its first element is smaller than our number but the next row's first element is larger. Number of steps: 2 (21, 31, found)
  • We do a binary search again over the third row (21, 22, 23, 24, 25, 26, 27, 28, 29, 30) and find our number. Number of steps: 2 - 3 (25, 27 or 28, found)
Bassam Mehanni
  • 14,796
  • 2
  • 33
  • 41
BlackBear
  • 22,411
  • 10
  • 48
  • 86
  • 1
    umm.. I think that's `O(MLog N)` isn't it? For each row `(M)` + search `(log N)`. – noMAD May 14 '12 at 19:47
  • @noMAD: No, maybe I misexplained myself. You do only 2 searches – BlackBear May 14 '12 at 19:48
  • Then I don't think I understood you. Could you please rephrase? – noMAD May 14 '12 at 19:55
  • @noMAD: check now. Sorry for my english ;) – BlackBear May 14 '12 at 20:05
  • 2
    This solution won't work since according to OP it is not guaranteed that a row starts with a number that is larger than the last column value in the previous row - the example matrix given is just unfortunate – BrokenGlass May 14 '12 at 20:22
  • @BlackBear: Check for my edit. I have provided another example of a sorted matrix. Your solution will not work for that. BrokenGlass is right. – noMAD May 14 '12 at 20:26
0

I think this can be done in O(log(n*n)*log(n)) time, where n is the no. of the rows of a square matrix.

By the properties of Matrix, the principal diagonal of the matrix is a sorted array. So, we can search an element or its lower bound in O(log(n)). Now, using this element as pivot, we have 4 sub-matrix. and we can say that all the elements in sub-matrix(top-left) are smaller, all the elements in sub-matrix (lower-right) are bigger. So, we can remove that from the search space.

Now, recursively search in sub-matrix (top-right) and in sub-matrix(lower-left).

Since, at each step, we perform a log(n) search (along principal diagonal) ad there can be atmost log(n*n) steps (as we reduce the search space by half in each step).

So, Time complexity = O(log(n)log(nn)).

Please correct, if anything is wrong.

Refrences - [Book]Cracking the coding interview (Question 11.6)

mridul
  • 105
  • 2
  • 6