How to solve this matrix problem to collect all the seeds?

Question

Problem description:

Hammy the hamster found a building with seeds. Help him collect them all.

The building is arranged in a matrix of rooms, with seeds in some rooms. Once in the building, Hammy will only run in a straight line, all the way through the building and collects all the seeds on that line. The building has many entrances and Hammy can enter at any row or column as he wishes.

He wants to collect all the seeds in the building with as few runs as possible.

Here is a sample layout of the building:

+---+---+---+---+---+
| * |   | * |   |   |
+---+---+---+---+---+
| * | * |   |   |   |
+---+---+---+---+---+
| * |   |   |   | * |
+---+---+---+---+---+
|   |   |   | * |   |
+---+---+---+---+---+
|   |   |   | * |   |
+---+---+---+---+---+

This building has 25 rooms (5X5) but only 8 rooms have seeds (rooms containing '*').

We need to find the minimum runs to collect all the seeds.

My approach:

I tried to solve it with some Greedy Approach, here it is:

1) Start with row/column that contains maximum rooms of seeds (For example: column 1 for here).

2) Updating rows/columns (no of rooms that contain seeds).

3) Repeating steps 1 & 2 until all the seeds are collected.

So, for this example when I apply my approach,

It starts with Column 1 (it contains 3 rooms),

after updating, next will be Column 4,

Here is the building situation right now,

+--+---+---+--+---+
|  |   | * |  |   |
+--+---+---+--+---+
|  | * |   |  |   |
+--+---+---+--+---+
|  |   |   |  | * |
+--+---+---+--+---+
|  |   |   |  |   |
+--+---+---+--+---+
|  |   |   |  |   |
+--+---+---+--+---+

Now three rooms contain seeds and all on different rows or columns, I needs to take all of them, as a result I finished with 5 runs, which is the maximum (taking all rows or all columns is always an answer withing thinking about the seeds at all). Without any calculation anyone can reach here. Runs containing all rows or columns.

But this problem can be solved in 4 runs.

I'm looking for an approach/algorithm to find the minimum runs (which rows and/or columns I need to run).

Sometimes the greedy approach is not the best one; this is one of them. You will have to write an algorithm that also tries the groups of 2. — Botje, Feb 26 '20 at 18:12
Maybe a different approach is to focus on reducing the number of rows/colums where there are seeds remaining. By doing that it would come up with column 4 (frees two rows), and row 1, 2, (each freeing 1 column) and 3 (frees 2 colums). I don't know that this approach will work for the general case though. — Wiingreen, Feb 26 '20 at 18:15
Rather than greedily *accepting* the line with the *most* seeds, your example is better solved by *rejecting* the *line* with the *fewest* seeds (and taking the necessary cross-lines). In fact, in your example this doesn't even involve any ambiguous decisions, since a line with only one seed is never better than its cross-line. (But in the general case this won't always give the optimal solution, either.) — ruakh, Feb 26 '20 at 18:37

User_67128 · Accepted Answer · 2020-03-02T18:38:47.063

The matrix can form a bipartite graph, all rows can form the vertices on left and all columns on the right. If there are seeds in a cell, it will establish an edge between left & right. So, the given example will form the following bipartite graph.

Now all we need to select a set of vertices so that at least one end point of every edge will be included on that vertex set. In graph theory this is known as vertex cover problem. https://en.wikipedia.org/wiki/Vertex_cover

To achieve minimum vertex cover we need to find the maximum matching of the graph. Hopcroft–Karp algorithm provides us to find a maximum matching in polynomial time (in worst case O(n^2.5). Once we have a maximum matching, Kőnig's theorem helps us to find the minimum vertex cover in O(n^2) time.

How it works:

Say M is the maximum matching, initially it is set to empty. From the bipartite graph, we need to find maximal set of vertex-disjoint shortest augmenting paths P.

here P = {r1c1, r2c2, r3c5, r4c4}

M = M (symmetric difference) P = {r1c1, r2c2, r3c5, r4c4}

in this example there are no more augmenting path remaining. So the maximum matching is achieved.

Now using Kőnig's theorem we need to find the minimum vertex cover from the matching M. Here the vertex cover will be {r1, r2, r3, c4}

So the minimum run is 4.

Overall complexity of this process is O(E * (V^0.5)), E & v are the length of set of edges and vertices. For a sparse matrix it will run in near-linear time.

Another example:

+---+---+---+---+---+
|   | * | * |   | * |
+---+---+---+---+---+
|   |   |   | * |   |
+---+---+---+---+---+
| * | * |   | * |   |
+---+---+---+---+---+
|   |   |   | * |   |
+---+---+---+---+---+

Bipartite graph:

Maximum matching M = {r1c2, r2c4, r3c1}

Minimum vertex cover = {r1, r3, c4}

So the minimum run is 3.

If you are only interested about the minimum run (number only, no need to show details) then you don't need to implement Kőnig's theorem, because the size of maximum matching is the length of minimum vertex cover set.

Could you please explain the process from matching to cover a little more. Thanks. — abhimanyue, Mar 03 '20 at 16:04
I implemented that part from the help of https://stackoverflow.com/questions/42836016/algorithm-for-minimum-vertex-cover-in-bipartite-graph. — User_67128, Mar 03 '20 at 23:39

David Eisenstat · Answer 2 · 2020-03-02T14:10:10.830

This problem is basically vertex cover in a bipartite graph. One set of nodes in the graph corresponds to the rows. The other set corresponds to columns. The edges correspond to seeds, where the endpoints of each edge correspond to the coordinates of the seed corresponding to the edge.

By Kőnig's theorem, the size of a minimum cover is equal to the size of a maximum matching, which can be computed efficiently using Hopcroft–Karp.

Wiingreen · Answer 3 · 2020-02-28T08:09:23.233

Maybe a different approach is to focus on reducing the number of rows/columns where there are seeds remaining. You can find the rows/columns by counting the number of seeds in each row/column like this.

  3   1   1   2   1
+---+---+---+---+---+
| * |   | * |   |   | 2
+---+---+---+---+---+
| * | * |   |   |   | 2
+---+---+---+---+---+
| * |   |   |   | * | 2
+---+---+---+---+---+
|   |   |   | * |   | 1
+---+---+---+---+---+
|   |   |   | * |   | 1
+---+---+---+---+---+

Then you take the sum of the rows with seeds in them in a row and divides them by the number of seeds. This gives you a score for this column. Do this for all rows and columns like this.

For instance, take row 1 which shares seeds with columns 1 and 3. Column 1 has 3 seeds in it and column 3 has 1 together that is 4 seeds. Divide this number by the number of rows and you get 2, therefor the score for row 1 is 2.

  2   2   2   1   2
+---+---+---+---+---+
| * |   | * |   |   | 2
+---+---+---+---+---+
| * | * |   |   |   | 2
+---+---+---+---+---+
| * |   |   |   | * | 2
+---+---+---+---+---+
|   |   |   | * |   | 2
+---+---+---+---+---+
|   |   |   | * |   | 2
+---+---+---+---+---+

Then you take the row/column with the lowest number that is not zero. By doing that it would come up with column 4 (frees two rows, score 1).

              ↓
+---+---+---+---+---+
| * |   | * |   |   |
+---+---+---+---+---+
| * | * |   |   |   |
+---+---+---+---+---+
| * |   |   |   | * |
+---+---+---+---+---+
|   |   |   | * |   |
+---+---+---+---+---+
|   |   |   | * |   |
+---+---+---+---+---+

And then row 1, 2, (each freeing 1 column, score 2) and 3 (frees 2 columns, score 1) in any order.

+---+---+---+---+---+
| * |   | * |   |   | ←
+---+---+---+---+---+
| * | * |   |   |   | ←
+---+---+---+---+---+
| * |   |   |   | * | ←
+---+---+---+---+---+
|   |   |   |   |   |
+---+---+---+---+---+
|   |   |   |   |   |
+---+---+---+---+---+

I don't know that this approach will work for the general case though.

How can you find such row/column that reduces more rows/columns? What will be the complexity? — User_67128, Feb 26 '20 at 18:30
@ManojBanik Good question, my thoughts were to go through each row and column and look, but I think this could probably be done more efficiently. I just tried to find an algorithm that would produce the optimal result in this case, since complexity wasn't part of the question. I might update the answer if I figure something out. — Wiingreen, Feb 26 '20 at 18:36
Added example to figure 2 to hopefully better explain the concept — Wiingreen, Feb 28 '20 at 08:09
After selecting a row or column, does it need to update? I think you can furnish you technique a little more. — abhimanyue, Feb 29 '20 at 13:28

Sérgio Castelani · Answer 4 · 2020-03-03T18:48:23.357

To score a line before each run you can basically check:

How many opposite (orthogonal) lines will be cleared if we discard the current one: +1 for each
How many parallel lines will be left to run: -1 for each.

Scores for the first run:

 -4  -4  -4 (-2) -4
+---+---+---+---+---+
| * |   | * |   |   | -3
+---+---+---+---+---+
| * | * |   |   |   | -3
+---+---+---+---+---+
| * |   |   |   | * | -3
+---+---+---+---+---+
|   |   |   | * |   | -4
+---+---+---+---+---+
|   |   |   | * |   | -4
+---+---+---+---+---+

Column 1 will clear 0 rows and leave 4 other columns to run = -4
Column 4 will clear 2 rows and leave 4 other columns to run = -2
Row 1 will clear 1 column and leave 4 other rows to run = -3

Scores for the second run:

 -3  -3  -3  -4  -3
+---+---+---+---+---+
| * |   | * |   |   | (-1)
+---+---+---+---+---+
| * | * |   |   |   | (-1)
+---+---+---+---+---+
| * |   |   |   | * | (-1)
+---+---+---+---+---+
|   |   |   |   |   | -3
+---+---+---+---+---+
|   |   |   |   |   | -3
+---+---+---+---+---+

Can be solved in polynomial time (cols*rows) using dynamic programming (calculate first scores of each row and columns then update only the affected ones when consuming one line).

I don’t get your technique, could you please explain it a little more. Thanks. — abhimanyue, Mar 03 '20 at 16:03

How to solve this matrix problem to collect all the seeds?

4 Answers4

Scores for the first run:

Scores for the second run: