4

I have an input matrix that is of unknown n x m dimensions that is populated by 1s and 0s

For example, a 5x4 matrix:

A = array(
  [[1, 0, 0, 0],
   [1, 0, 0, 0],
   [0, 1, 1, 0],
   [0, 1, 1, 0],
   [1, 0, 1, 1]])

Goal

I need to create a 1 : 1 map between as many columns and rows as possible, where the element at that location is 1.

What I mean by a 1 : 1 map is that each column and row can be used once at most.

the ideal solution has the most mappings possible ie. the most rows and columns used. It should also avoid exhaustive combinations or operations that do not scale well with larger matrices (practically, maximum dimensions should be 100x100, but there is no declared limit so they could go higher)

Here's a possible outcome of the above

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

Some more Examples:

input:
0 1 1
0 1 0
0 1 1

output (one of several possible ones):
0 0 1
0 1 0
0 0 0 

another (this shows one problem that can arise)

input:
0 1 1 1
0 1 0 0
1 1 0 0

a good output (again, one of several):
0 0 1 0
0 1 0 0
1 0 0 0

a bad output (still valid, but has fewer mappings)
0 1 0 0
0 0 0 0
1 0 0 0

to better show how their can be multiple outputs

input: 
0 1 1
1 1 0

one possible output:
0 1 0
1 0 0

a second possible output:
0 0 1
0 1 0

a third possible output
0 0 1
1 0 0

What have I done?

I have a really dumb way of handling it right now which is not at all guaranteed to work. Basically I just build a filter matrix out of an identity matrix (because its the perfect map, every row and every column are used once) and then I randomly swap its columns (n times) and filter the original matrix with it, recording the filter matrix with the best results.

My [non] solution:

import random
import numpy as np

# this is a starting matrix with random values
A = np.array(
  [[1, 0, 0, 0],
   [1, 0, 0, 0],
   [0, 1, 1, 0],
   [0, 1, 1, 0],
   [1, 0, 1, 1]])

# add dummy row to make it square
new_col = np.zeros([5,1]) + 1
A = np.append(A, new_col, axis=1)

# make an identity matrix (the perfect map)
imatrix = np.diag([1]*5)

# randomly swap columns on the identity matrix until they match. 
n = 1000

# this will hold the map that works the best
best_map_so_far = np.zeros([1,1])

for i in range(n):
    a, b = random.sample(range(5), 2)
    t = imatrix[:,a].copy()
    imatrix[:,a] = imatrix[:,b]
    imatrix[:,b] = t

    # is this map better than the previous best?
    if sum(sum(imatrix * A)) > sum(sum(best_map_so_far)):
        best_map_so_far = imatrix

    # could it be? a perfect map??
    if sum(sum(imatrix * A)) == A.shape[0]:
        break
    # jk.

# did we still fail
if sum(sum(imatrix * A)) != 5:
    print('haha')

# drop the dummy row
output = imatrix * A
output[:,:-1]

#... wow. it actually kind of works.
tinker tailor
  • 161
  • 1
  • 5
  • 1
    I don't understand what the rules are for the solution. Can you give some more, smaller examples? Perhaps some 2x2 and 3x3 examples? And indicate if you are looking for "any" solution, or the "best" solution (what makes it best)? And what size are you actual inputs (at maximum)? – John Zwinck Jun 24 '17 at 04:20
  • if anyone has a suggested title I'm all ears, i don't think mine is very intuitive – tinker tailor Jun 24 '17 at 04:20
  • I'll make some more examples quickly. I'm looking for the *best* solution, which should be any solution that guarantees the maximum number of mappings with the least amount of work. **edit** there's no declared limit on the matrix dimensions, but I think anything over 100 is decreasingly likely (100x100 matrix). – tinker tailor Jun 24 '17 at 04:24

2 Answers2

1

How about this?

let S be the solution vector, as wide as A, containing row numbers.
let Z be a vector containing the number of zeros in each column.

for each row:
    select the cells which contain 1 in A and no value in S.
    select from those cells those with the highest score in Z.
    select from those cells the first (or a random) cell.
    store the row number in the column of S corresponding to the cell.

Does that give you a sufficient solution? If so it should be much more efficient than what you have.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • If I understand correctly, I don't think it fully accounts for collisions. By selecting the first or random cell, you may later on eliminate other rows that only have values in that column. I think you were accounting for it a bit with Z, but there's no guarantee it would work. Still, it is better optimized than mine – tinker tailor Jun 24 '17 at 04:55
  • @tinkertailor: I agree it may not produce the optimal solution in all cases. However, you did say "Any solution that guarantees the maximum number of mappings with the least amount of work." I don't know if there's an optimal solution with less work required. – John Zwinck Jun 24 '17 at 04:57
  • Ah, sorry, I'll try to clear that up. Attaining the maximum number of mappings is the most important. As for the optimization, I just wanted to avoid an exhaustive solution... because you could technically just take my solution and try all possible column orders (actually, that's better And more optimized than my current solution even though I was avoiding it... whoops). I'll make a note of this to avoid further confusion – tinker tailor Jun 24 '17 at 05:02
1

Let me give it a go. The algorithm I suggest will not always give the optimal solution, but maybe somebody can improve it.

  1. You can always interchange two columns or two rows without changing the problem. Further, by keeping track of the changes you can always go back to the original problem.

  2. We are going to fill the main diagonal with 1s as far as it will go. Get the first 1 in the upper left corner by interchanging columns, or rows, or both. Now the first row and column are fixed and we don't touch them anymore. We now try to fill in the second element on the diagonal with 1, and then fix the second row and column. And so on.

  3. If the bottom right submatrix is zero, we should try to bring a 1 there by interchanging two columns or two rows using the whole matrix but preserving the existing 1s in the diagonal. (Here lies the problem. It is easy to check efficiently if one interchange can help. But it could be that at least two interchanges are required, or maybe more.)

  4. We stop when no more 1s can be obtained on the diagonal.

So, while the algorithm is not always optimal, maybe it is possible to come up with extra rules how to interchange columns and rows so as to populate the diagonal with 1s as far as possible.

Andrei
  • 2,585
  • 1
  • 14
  • 17