Suppose you have one matrix of height h and width w (h,w <= 500) You have to find two submatrices of same size that are equal. Any idea for solving ?
-
2equal means each element in the submatrices are the same? – taocp Mar 19 '14 at 00:25
-
What about the type of elements in the matrix ? Are those numbers or can be characters/strings ? – Mohan Mar 19 '14 at 00:37
-
Should the submatrices be maximal? If not, it would be easy, you can just check whether there are two elements with the same item. – notbad Mar 19 '14 at 01:42
-
@notbad It's in the title, but not in the question text – Niklas B. Mar 19 '14 at 01:44
-
Elements are lowercase letters. – NightDev Mar 19 '14 at 01:51
-
@user3333971 What is the source of this problem? Is there some input distribution we can assume? What is the intended run time? – Niklas B. Mar 19 '14 at 01:53
-
1Do the submatrices need to be non-overlapping? – Juan Lopes Mar 19 '14 at 01:53
-
Somewhat related [Largest submatrix with equal no of 1's and 0's](http://stackoverflow.com/questions/13698298/largest-submatrix-with-equal-no-of-1s-and-0s). – jweyrich Mar 19 '14 at 03:50
-
@jweyrich that seems to be pretty unrelated – Niklas B. Mar 19 '14 at 03:59
2 Answers
There is an algorithm better than O(w2h2). Let consider an easier version first. Given a matrix M
with only lowercase letters, find the maximum sub-string in the matrix. This problem equals to find the longest common substring in M[0]$1M[1]...M[w-1]
, where we add distinct special characters between M[i]
and M[i+1]
. Using a suffix array, the longest common substring problem can be solved in linear time, for this case, it can be solved in O(wh).
For the largest submatrices problem, it can be reduces to the substring problem by enumerating all possible heights l<=h, at the same time, the lexicographical order two substring with height l
can inherit from the order of substring with height l-1
.
As @Niklas B explained. In the first iteration we rank the row suffixes of the matrix using the suffix array. Then in the second step, we rank the suffixes of adjacent 2-row combinations, using radix sort and by reusing the ranks computed in the first iteration. In the third step, we sort the suffixes of adjacent 3-rows etc. We can also maintain LCP arrays for each iteration so that we can find the longest substring that appears twice, using a single pass.
In total, this algorithm is O(h2w) with a linear time suffix array construction algorithm.
-
-
Yes, I begin to see what you mean by the second paragraph, but it's a bit hard to understand. – Niklas B. Mar 19 '14 at 03:12
-
I got it now. Very cool idea, but it's really, really hard to understand from your answer alone :/ – Niklas B. Mar 19 '14 at 03:20
-
Yes, so in the first iteration we rank the row suffixes of the matrix using the suffix array. Then in the second step, we rank the suffixes of adjacent 2-row combinations, probably using radix sort with the ranks of the first iteration. In the third step, we sort the suffixes of adjacent 3-rows etc. Is that about correct? – Niklas B. Mar 19 '14 at 03:22
-
-
1Sure, feel free. +1 for you, this is definitely a much better solution, OP should consider accepting this instead – Niklas B. Mar 19 '14 at 03:26
You can enumerate all vectors (-h < dh < h, 0 <= dw < w) in O(hw). For each such vector, translate the matrix by the vector (dh, dw) and subtract the translated matrix from the original one. In the range where they overlap you want to find the rectangle of zeroes that has the largest area. You can use a linear time algorithm to do that.
Runtime: O(w2h2)
A more pragmatic approach would be to hash all submatrices and check for duplicates that way. This would have the same expected runtime.
-
h,w <= 500, O(w²h²) will take too long, won't it (at least for a programming contest standard)? – Juan Lopes Mar 19 '14 at 01:45
-
@JuanLopes: I don't know, it will probably run in under a minute by my estimate. In a contest this would definitely be much too slow, but maybe it's not that kind of problem – Niklas B. Mar 19 '14 at 01:46
-
I don't see many statements like "height h and width w (h,w <= 500)" outside contests :P – Juan Lopes Mar 19 '14 at 01:49
-
@Juan Me neither. Would be interesting to see whether there is a subquadratic algorithm for this – Niklas B. Mar 19 '14 at 01:50
-
Idea about hashing submatrices seems good for my problem, but I don't know way how to hash it. – NightDev Mar 19 '14 at 01:53
-
@user3333971 You can just use some polynomial hash function like `sum p^(w*i+j) * a(i,j) mod q`, where p and q are primes, q is large (64-bit or so) and i,j are the indices of the cells relative to the upper left corner of the submatrix. You can even use `q = 2^64` to make your life really easy (but the hash function will not be as good then). Of course you need to reuse subresults to achieve quadratic runtime, but the hash function above can be used that way – Niklas B. Mar 19 '14 at 01:56
-
@JuanLopes Check out the other answer, it will be able to solve this even under contest conditions. In fact I have to say it's pretty creative – Niklas B. Mar 19 '14 at 03:34