2

Suppose you have one matrix of height h and width w (h,w <= 500) You have to find two submatrices of same size that are equal. Any idea for solving ?

NightDev
  • 35
  • 6

2 Answers2

2

There is an algorithm better than O(w2h2). Let consider an easier version first. Given a matrix M with only lowercase letters, find the maximum sub-string in the matrix. This problem equals to find the longest common substring in M[0]$1M[1]...M[w-1], where we add distinct special characters between M[i] and M[i+1]. Using a suffix array, the longest common substring problem can be solved in linear time, for this case, it can be solved in O(wh).

For the largest submatrices problem, it can be reduces to the substring problem by enumerating all possible heights l<=h, at the same time, the lexicographical order two substring with height l can inherit from the order of substring with height l-1.

As @Niklas B explained. In the first iteration we rank the row suffixes of the matrix using the suffix array. Then in the second step, we rank the suffixes of adjacent 2-row combinations, using radix sort and by reusing the ranks computed in the first iteration. In the third step, we sort the suffixes of adjacent 3-rows etc. We can also maintain LCP arrays for each iteration so that we can find the longest substring that appears twice, using a single pass.

In total, this algorithm is O(h2w) with a linear time suffix array construction algorithm.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
notbad
  • 2,797
  • 3
  • 23
  • 34
  • Sorry for my poor English, can you understand the first part? – notbad Mar 19 '14 at 03:12
  • Yes, I begin to see what you mean by the second paragraph, but it's a bit hard to understand. – Niklas B. Mar 19 '14 at 03:12
  • I got it now. Very cool idea, but it's really, really hard to understand from your answer alone :/ – Niklas B. Mar 19 '14 at 03:20
  • Yes, so in the first iteration we rank the row suffixes of the matrix using the suffix array. Then in the second step, we rank the suffixes of adjacent 2-row combinations, probably using radix sort with the ranks of the first iteration. In the third step, we sort the suffixes of adjacent 3-rows etc. Is that about correct? – Niklas B. Mar 19 '14 at 03:22
  • Could I add this to my answer? – notbad Mar 19 '14 at 03:25
  • 1
    Sure, feel free. +1 for you, this is definitely a much better solution, OP should consider accepting this instead – Niklas B. Mar 19 '14 at 03:26
0

You can enumerate all vectors (-h < dh < h, 0 <= dw < w) in O(hw). For each such vector, translate the matrix by the vector (dh, dw) and subtract the translated matrix from the original one. In the range where they overlap you want to find the rectangle of zeroes that has the largest area. You can use a linear time algorithm to do that.

Runtime: O(w2h2)

A more pragmatic approach would be to hash all submatrices and check for duplicates that way. This would have the same expected runtime.

Community
  • 1
  • 1
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • h,w <= 500, O(w²h²) will take too long, won't it (at least for a programming contest standard)? – Juan Lopes Mar 19 '14 at 01:45
  • @JuanLopes: I don't know, it will probably run in under a minute by my estimate. In a contest this would definitely be much too slow, but maybe it's not that kind of problem – Niklas B. Mar 19 '14 at 01:46
  • I don't see many statements like "height h and width w (h,w <= 500)" outside contests :P – Juan Lopes Mar 19 '14 at 01:49
  • @Juan Me neither. Would be interesting to see whether there is a subquadratic algorithm for this – Niklas B. Mar 19 '14 at 01:50
  • Idea about hashing submatrices seems good for my problem, but I don't know way how to hash it. – NightDev Mar 19 '14 at 01:53
  • @user3333971 You can just use some polynomial hash function like `sum p^(w*i+j) * a(i,j) mod q`, where p and q are primes, q is large (64-bit or so) and i,j are the indices of the cells relative to the upper left corner of the submatrix. You can even use `q = 2^64` to make your life really easy (but the hash function will not be as good then). Of course you need to reuse subresults to achieve quadratic runtime, but the hash function above can be used that way – Niklas B. Mar 19 '14 at 01:56
  • @JuanLopes Check out the other answer, it will be able to solve this even under contest conditions. In fact I have to say it's pretty creative – Niklas B. Mar 19 '14 at 03:34