8

I have an assignment to write an algorithm (not in any particular language, just pseudo-code) that receives a matrix [size: M x N] that is sorted in a way that all of it's rows are sorted and all of it's columns are sorted individually, and finds a certain value within this matrix. I need to write the most time-efficient algorithm I can think of.

The matrix looks something like:

1  3  5
4  6  8
7  9 10

My idea is to start at the first row and last column and simply check the value, if it's bigger go down and if it's smaller than go left and keep doing so until the value is found or until the indexes are out of bounds (in case the value does not exist). This algorithm works at linear complexity O(m+n). I've been told that it's possible to do so with a logarithmic complexity. Is it possible? and if so, how?

Bob
  • 185
  • 1
  • 4
  • could you possibly share an example of data? Surely you were given a sample. – jcolebrand Nov 09 '10 at 20:25
  • "all of its rows are sorted and all of its columns are sorted individually": What does this mean? – TonyK Nov 09 '10 at 20:32
  • the values in every row are sorted and the values in every column are sorted – Bob Nov 09 '10 at 20:38
  • I think he means that the value in the top left (1,1) will be the smallest, and the value at the bottom right (n,m) will be the largest. The rows and columns are both sorted. – Sagar Nov 09 '10 at 20:40
  • @sagar but that's not the example given by the professor. otherwise he had the fastest method above (check the end of the row first, then proceed) additionally, checking the end of the middlest row first would be faster, a bit of a binary search. – jcolebrand Nov 09 '10 at 20:42
  • aah right. didn't see the 8/7 issue – Sagar Nov 09 '10 at 20:47
  • possible duplicate of [Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best way to search for a target number?](http://stackoverflow.com/questions/2457792/given-a-2d-array-sorted-in-increasing-order-from-left-to-right-and-top-to-bottom) – Steve Jessop Nov 09 '10 at 22:25
  • Dupe question has an answer proving Big_Omega(min(N,M)) lower complexity bound: http://stackoverflow.com/questions/2457792/given-a-2d-array-sorted-in-increasing-order-from-left-to-right-and-top-to-bottom/2468729#2468729. So no logarithmic solution when N and M are close. Your example data looks potentially better, in the sense that it also has the property that each diagonal is sorted, but you don't state that constraint in the question. – Steve Jessop Nov 09 '10 at 22:26
  • @steveJessop that's because he's going by what was given him by his professor. Additionally it seems to be a class focusing on algos and not on implementation. – jcolebrand Nov 09 '10 at 22:46
  • Did you ever get this resolved successfully? Do you still need help with this? – jcolebrand Dec 14 '10 at 04:06
  • Possible duplicate of [Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best way to search for a target number?](https://stackoverflow.com/questions/2457792/given-a-2d-array-sorted-in-increasing-order-from-left-to-right-and-top-to-bottom) – Bernhard Barker Jan 02 '18 at 12:34

9 Answers9

4

Your matrix looks like this:

a ..... b ..... c
. .     . .     .
.   1   .   2   . 
.     . .     . .
d ..... e ..... f
. .     . .     .
.   3   .   4   .
.     . .     . .
g ..... h ..... i

and has following properties:

a,c,g < i  
a,b,d < e
b,c,e < f
d,e,g < h
e,f,h < i

So value in lowest-rigth most corner (eg. i) is always the biggest in whole matrix and this property is recursive if you divide matrix into 4 equal pieces.

So we could try to use binary search:

  1. probe for value,
  2. divide into pieces,
  3. choose correct piece (somehow),
  4. goto 1 with new piece.

Hence algorithm could look like this:

input: X - value to be searched
until found
 divide matrix into 4 equal pieces
 get e,f,h,i as shown on picture
 if (e or f or h or i) equals X then 
   return found
 if X < e then quarter := 1
 if X < f then quarter := 2
 if X < h then quarter := 3
 if X < i then quarter := 4
 if no quarter assigned then 
    return not_found
 make smaller matrix from chosen quarter 

This looks for me like a O(log n) where n is number of elements in matrix. It is kind of binary search but in two dimensions. I cannot prove it formally but resembles typical binary search.

Michal Sznajder
  • 9,338
  • 4
  • 44
  • 62
  • The only problem with this is that the final matrix is of MxN size, and it doesn't look (at first glance) as tho this method scales that well. However, you may be on to something, maybe the folks at math.stackexchange.com can help you flush it out? – jcolebrand Nov 09 '10 at 21:50
  • 4
    "if X < f then quarter := 2" - doesn't follow, unfortunately. Quarter 3 can contain values which are larger than e, but smaller than f. For instance, the 7 in the example input. – Steve Jessop Nov 09 '10 at 22:13
  • 1
    It would be quite surprising if this was O(log n) when that is the optimal complexity for searching a sorted one-dimensional array, which is what we would have if the entire MxN matrix was sorted with all elements on line i smaller than all elements on i+1 for all lines. The complexity has to be worse than that. – Martin G May 05 '14 at 12:33
2

and that's how the sample input looks? Sorted by diagonals? That's an interesting sort, to be sure.

Since the following row may have a value that's lower than any value on this row, you can't assume anything in particular about a given row of data.

I would (if asked to do this over a large input) read the matrix into a list-struct that took the data as one pair of a tuple, and the mxn coord as the part of the tuple, and then quicksort the matrix once, then find it by value.

Alternately, if the value of each individual location is unique, toss the MxN data into a dictionary keyed on the value, then jump to the dictionary entry of the MxN based on the key of the input (or the hash of the key of the input).

EDIT:

Notice that the answer I give above is valid if you're going to look through the matrix more than once. If you only need to parse it once, then this is as fast as you can do it:

for (int i = 0; i<M; i++)
 for (int j=0; j<N; j++)
  if (mat[i][j] == value) return tuple(i,j);

Apparently my comment on the question should go down here too :|

@sagar but that's not the example given by the professor. otherwise he had the fastest method above (check the end of the row first, then proceed) additionally, checking the end of the middlest row first would be faster, a bit of a binary search.

Checking the end of each row (and starting on the end of the middle row) to find a number higher than the checked for number on an in memory array would be fastest, then doing a binary search on each matching row till you find it.

jcolebrand
  • 15,889
  • 12
  • 75
  • 121
  • well the diagonals do appear to be sorted, didn't notice that before. but the sort is not by diagonals, it is simpler. every row by itself is sorted and every column by itself is sorted. but like you said reading a row doesn't give any information about the next row. – Bob Nov 09 '10 at 20:38
  • nor a column about the next column. You can make no assumptions about the speed of finding an element, without iterating over every one. I'm trying to look for the fastest possible method of finding a value. An O(n) seems like the fastest, but if you're going to do what I suggested, that's only valid if you're looking for lots of the values. Otherwise, if you just parse it once, then it's faster to just for loop it twice on the MxN with an early abort on found. – jcolebrand Nov 09 '10 at 20:39
  • but as soon as you need to find two values, mine way given in the answer is faster than a double for loop – jcolebrand Nov 09 '10 at 20:40
  • that's also what I think. I think that O(n) is good enough in this case, I can't see any way to find it with a better complexity without adding another data structure. and i also do not need to search twice, in the assignment it states that if the value appears more than once I still need to find only one instance of it. – Bob Nov 09 '10 at 20:42
2

in log M you can get a range of rows able to contain the target (binary search on the first value of rows, binary search on last value of rows, keep only those rows whose first <= target and last >= target) two binary searches is still O(log M)
then in O(log N) you can explore each of these rows, with again, a binary search!

that makes it O(logM x logN)
tadaaaa

Jean-Bernard Pellerin
  • 12,556
  • 10
  • 57
  • 79
  • 1
    You can do one binary search, just keep the lower pointer on the left side, and the upper pointer on the right side. Everything between the two at the end is the set of rows to explore. – Kevin Stricker Nov 09 '10 at 20:58
  • 2
    to simplify math let's assume that the size is NxN. now let's take a bad case: let's say that all values in the first row are smaller that 10 and all values in the last row are bigger than 100 and i'm looking for a target=50. in this case i can't eliminate any row so I still have to binary-search n rows and that will give me O(n*logn) – Bob Nov 09 '10 at 21:11
  • @Bob ~ Yeah, we know, but you would then be starting the binary search from the last row to the first. You do know binary searches right? http://en.wikipedia.org/wiki/Binary_search_algorithm – jcolebrand Nov 09 '10 at 21:17
  • 2
    I second Bob, this answer is incorrect and deserves a down-vote. Worse case complexity is still O(M.Log(N)), the first search could very well return the whole set of rows. – Rabih Kodeih Aug 13 '13 at 15:36
  • Downvoted as well. O(n) notation stands for worst-case complexity and that is not O(logM x logN). You can argue that the expected time is lower with this solution, but it still maintains the O(n*log n) worst-case complexity. – dcasadevall Sep 25 '13 at 22:20
  • Yeah, Algorithm gives correct answer, but worst case runtime is totally wrong. – Vamsi Nerella Oct 16 '14 at 12:19
0
public static boolean find(int a[][],int rows,int cols,int x){
    int m=0;
    int n=cols-1;
while(m<rows&&n>=0){
    if(a[m][n]==x)
        return1;
    else if(a[m][n]>x)
        n--;
    else m++;
}

}
banjara
  • 3,800
  • 3
  • 38
  • 61
0

this is wrong answer

I am really not sure if any of the answers are the optimal answers. I am going at it.

  1. binary search first row, and first column and find out the row and column where "x" could be. you will get 0,j and i,0. x will be on i row or j column if x is not found in this step.
  2. binary search on the row i and the column j you found in step 1.

I think the time complexity is 2* (log m + log n).

You can reduce the constant, if the input array is a square (n * n), by binary searching along the diagonal.

Vamsi Nerella
  • 770
  • 6
  • 8
  • 1
    Sorry, this is wrong. Imagine a 4X4 matrix where the first row and first column both look like `1 2 3 10`, and you're requested to look for 7. You would say that it's on the 3rd row and 3rd col, but this isn't so. 7 could be at `(2, 2)` and all the rest of the cells can be 999... – ihadanny Apr 13 '15 at 16:35
0

what about getting the diagonal out, then binary search over the diagonal, start bottom right check if it is above, if yes take the diagonal array position as the column it is in, if not then check if it is below. each time running a binary search on the column once you have a hit on the diagonal (using the array position of the diagonal as the column index). I think this is what was stated by @user942640

you could get the running time of the above and when required (at some point) swap the algo to do a binary search on the initial diagonal array (this is taking into consideration its n * n elements and getting x or y length is O(1) as x.length = y.length. even on a million * million binary search the diagonal if it is less then half step back up the diagonal, if it is not less then binary search back towards where you where (this is a slight change to the algo when doing a binary search along the diagonal). I think the diagonal is better than the binary search down the rows, Im just to tired at the moment to look at the maths :)

by the way I believe running time is slightly different to analysis which you would describe in terms of best/worst/avg case, and time against memory size etc. so the question would be better stated as in 'what is the best running time in worst case analysis', because in best case you could do a brute linear scan and the item could be in the first position and this would be a better 'running time' than binary search...

dvr
  • 51
  • 7
0

Here is a lower bound of n. Start with an unsorted array A of length n. Construct a new matrix M according to the following rule: the secondary diagonal contains the array A, everything above it is minus infinity, everything below it is plus infinity. The rows and columns are sorted, and looking for an entry in M is the same as looking for an entry in A.

Yuval Filmus
  • 413
  • 4
  • 14
0

This is in the vein of Michal's answer (from which I will steal the nice graphic).

Matrix:

  min ..... b ..... c
    .       .       .
    .  II   .   I   . 
    .       .       .
    d .... mid .... f
    .       .       .
    .  III  .   IV  .
    .       .       .
    g ..... h ..... max

Min and max are the smallest and largest values, respectively. "mid" is not necessarily the average/median/whatever value.

We know that the value at mid is >= all values in quadrant II, and <= all values in quadrant IV. We cannot make such claims for quadrants I and III. If we recurse, we can eliminate one quadrant at each level.

Thus, if the target value is less than mid, we must search quadrants I, II, and III. If the target value is greater than mid, we must search quadrants I, III, and IV.

The space reduces to 3/4 its previous at each step:

n * (3/4)x = 1

n = (4/3)x

x = log4/3(n)

Logarithms differ by a constant factor, so this is O(log(n)).

find(min, max, target)
    if min is max
        if target == min
            return min
        else
            return not found
    else if target < min or target > max
        return not found
    else
        set mid to average of min and max 
        if target == mid
            return mid
        else
            find(b, f, target), return if found
            find(d, h, target), return if found

            if target < mid
                return find(min, mid, target)
            else
                return find(mid, max, target)
Community
  • 1
  • 1
kmantel
  • 105
  • 1
  • 5
0

JavaScript solution:

//start from the top right corner
//if value = el, element is found
//if value < el, move to the next row, element can't be in that row since row is sorted
//if value > el, move to the previous column, element can't be in that column since column is sorted

function find(matrix, el) {

  //some error checking
  if (!matrix[0] || !matrix[0].length){
    return false;
  }
  if (!el || isNaN(el)){
    return false;
  }

  var row = 0; //first row
  var col = matrix[0].length - 1; //last column

  while (row < matrix.length && col >= 0) {
    if (matrix[row][col] === el) { //element is found
      return true;
    } else if (matrix[row][col] < el) {
      row++; //move to the next row
    } else {
      col--; //move to the previous column
    }
  }

  return false;

}
spark
  • 79
  • 1
  • 1