6

Given a matrix of size m X n with no repetition of values in rows or columns, is there an efficient method of detecting cycles?

For example, here is a sample matrix:

3 5 2 9 7 4
4 3 6 1 8 7
1 4 7 5 2 9
9 8 3 2 6 1
2 6 8 7 3 5

It has at least one permutation cycle of size 3:

3 5 2 9 7 4
4 8 3 1 6 7
1 4 7 5 2 9
9 6 8 2 3 1
2 3 6 7 8 5

The values 3, 6, and 8 in rows 2, 4 and 5 form a cycle.

The problem relates to Kakuro puzzles. Nothing to do with solving them, but trying to detect whether any layout of a particular grid makes it unsuitable. A cycle of any sort would make that particular layout invalid - since the sum of rows and columns is the same for both layouts.

Jim White
  • 211
  • 1
  • 6
  • 1
    How values 3, 6 and 8 form a cycle? can you explain more? I cannot see any relation with 3, 6 and 8. – Pham Trung Jun 04 '15 at 08:44
  • 2
    @PhamTrung: I think it is this: When you remove the first, fourth and last columns and the first and third rows, you get a Latin square with the numbers 3, 6 and 8. – M Oehm Jun 04 '15 at 10:13
  • @MOehm thanks, now I see some connections, and if we rotate 3, 6, and 8 in one row, we also get the next permutation of the next rows. – Pham Trung Jun 04 '15 at 10:20
  • Yes, M Oehm is exactly right. And I got the row numbers wrong which wouldn't have helped. And thanks to my mystery editor. – Jim White Jun 04 '15 at 18:52
  • 1
    I notice that 2 and 7 on rows 1 and 3 is a "cycle" too. – j_random_hacker Jun 04 '15 at 19:59
  • Yes I just found it when testing a simple brute-force search! Well spotted. – Jim White Jun 04 '15 at 21:28
  • I don't have time right now to write a full answer, but I've thought of an O(nmk2^k) algorithm, with k the largest matrix entry. Exponential in k, but very fast if k <= 9 :) The set of numbers along the top row of a cycle must equal the set of numbers along the leftmost column, so for each possible top-left corner (i, j), we can intersect the numbers to its right with the numbers below it to find the set of (<= k) *candidate numbers* C to include alongside x[i,j]. Then try all 2^k subsets of them (sometimes using tricks to eliminate them faster, that I think will apply very frequently). – j_random_hacker Jun 04 '15 at 23:46
  • I've just got a version going. For a given cycle length _p_ and tlc position _(r, c)_, I take every possible col- and row- set combination for completing the cycle, of which there are _C(n-r, p-1)_ possibilities for each. That gives me every distinct candidate-cell matrix. I then count the number of different values _d_ in that set. If _d = p_ then it must be a cycle. This seems to be independent of the number of symbols _k_. – Jim White Jun 05 '15 at 01:32
  • That sounds OK, but the sum of C(n-r, p-1) over all values of p is already O(2^(n-r)) -- and you will actually be summing C(n-r, p-1)^2 over all values of p, which is much larger. Although I wrote O(2^k) for using symbol subsets as I suggested, this is also O(2^(n-r)) (after the O(k) merge step to find the intersection). The advantage of using symbol subsets (plus TLC) over row and column subsets (plus TLC) is twofold: (1) A symbol that appears in the intersection I described simultaneously determines a row *and* column, so there are only O(2^(n-r)) choices to try. (2) ... – j_random_hacker Jun 05 '15 at 16:26
  • ... Given a symbol subset C, you can instantly rule out a symbol c if the entry at (row(c, tlc), col(c, tlc)) is not in C. This will occur often, and will often lead to cascades of symbols being eliminated from contention at a particular TLC. (Here "row(c, tlc)" means "the unique row at or below tlc that contains symbol c in the leftmost column", and "col(c, tlc)" means "the unique column at or to the right of tlc that contains symbol c in the topmost row" -- these can be easily precomputed in O(nmk) time.) – j_random_hacker Jun 05 '15 at 16:28
  • Another trick is that for some given TLC, if there is any c in C such that the x[row(c, tlc), col(c, tlc)] = x[TLC], then you can immediately stop -- since this means you have a 2x2 cycle, whose top-left and bot-right entries are x[TLC], and whose opposite corners are c. :) – j_random_hacker Jun 05 '15 at 16:37

1 Answers1

1

I think you can do this in O(n^3) time, for an n x n grid.

Idea

Consider your example grid, and hypothesize whether the topleft 3 and 5 can end up in a latin sub-square.

(3) (5)  2   9   7   4
 4   8   3   1   6   7
 1   4   7   5   2   9
 9   6   8   2   3   1
 2   3   6   7   8   5

Because we want a latin square, we're forced to include that 3 in the (5) column (all values must appear in each column), and also the nearby 2 (must form a square):

(3) (5)  2   9   7   4
 4   8   3   1   6   7
 1   4   7   5   2   9
 9   6   8   2   3   1
(2) (3)  6   7   8   5

We can keep doing this implication for awhile, but we'll soon run into a problem: the left row doesn't contain a 5. Including the top-left 3 and the top-left 5 leads to a contradiction.

In general, anytime we include 2 values in the same row or the same column, that pair will force up to 4 other values to be included and/or will imply a contradiction. We want to play with this implication structure to quickly eliminate bad solutions, leaving only good ones.

Make a Graph

Since we have this useful implication structure, we should explore it. Create a node for each horizontal and vertical pair of values, and insert directed edges between those nodes whenever a pair implies another pair must be included. Also have a node for "contradiction". For example, the {(0, 0), (0, 1)} pair corresponding to the top-left (3) (5) in the example would have outgoing edges to {(0, 0), (4, 0)}, {(0, 1), (4, 1)}, and contradiction.

The result is a graph with a lot of nodes pointing at contradiction and, potentially, some nodes pointing at each other in a cycle. Rip out the contradiction node and anything that points to it directly or indirectly, and all that's left should be the cycles and any cycle should correspond to a latin square.

Correctness

I'm honestly not sure if this is correct. It's clear that the latin square is not being immediately exhaustively checked for correctness in the sense of each added pair causing all the necessary work to happen... but I think all the bad cases that would be missed are ones where a value is duplicated and that was guaranteed not to happen in the input.

More work needed.

Complexity

There are O(n^3) nodes in the graph because there are O(n^2) pairs in a row or column, and O(n) rows+columns. There are also O(n^3) edges because each node has at most 4 out-edges.

Removing things pointing at contradiction takes time proportional to the number of nodes, assuming you're using edge lists. Just do a backwards flood-fill, following in-edges upstream.

Detecting a cycle takes time proportional to the number of nodes and edges: bucket nodes based on the number of out-nodes they have (at most 4), and keep removing nodes in the 0-out bucket and re-bucketing the affected nodes until done. If there's anything left, it's a cycle.

Since all operations take time proportional to the number of nodes and edges, and we have O(n^3) nodes+edges, the overall algorithm takes O(n^3) time.

Craig Gidney
  • 17,763
  • 5
  • 68
  • 136