8

I wish to generate 10,000 random binary matrices which have the same number of 1s per row and per column as a given binary matrix.

The matrix is ~500 x ~10,000. There are about 2,000,000 1s. There are no zero rows or columns.

My current method converts the binary matrix into a bipartite adjacency matrix, and performs 1,000,000 random edge switches to guarantee randomness. This takes 13,000 seconds for 1 matrix. I'm coding in python, using a modified version of networkx's double_edge_swap function.

Is there a more efficient way to generate such matrices?

Jonathan Lu
  • 151
  • 1
  • 7
  • 2
    I was looking for the name of this problem. It is the main problem of [discrete tomography](https://en.wikipedia.org/wiki/Discrete_tomography) "which deals with the reconstruction of a binary image from its horizontal and vertical linesums" and for the case of 2 dimensions (pairwise nonparallel lattice directions), the problem is in P. It would be interesting to know what needs 10,000 randomly chosen possible reconstructions. – Dan D. Sep 01 '15 at 06:57
  • You should specify if you need a particular distribution, since different methods might give slightly different distributions. – Veedrac Sep 01 '15 at 16:09
  • It depends if you want to improve only efficient for generate matricies the good solution will be call c( function for generate matrix from python. – ElConrado Sep 01 '15 at 20:10

2 Answers2

2

I think you can first build a special case of such a matrix, and then use numpy.shuffle to shuffle it:

row_sum = 2
col_sum = 1
arr     = np.zeros((5, 10))
#generate a special case, with given row_sum and col_sum
for i in range(row_sum):
    arr.ravel()[i::arr.shape[1]+row_sum] = 1
arr

Out[84]: 
array([[ 1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.]])

np.random.shuffle(arr)
#np.random.shuffle(arr.T) to shuffle the columns
arr
Out[89]: 
array([[ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

arr.sum(1) #row sums
Out[90]: array([ 2.,  2.,  2.,  2.,  2.])

arr.sum(0) #col sums
Out[91]: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
Veedrac
  • 58,273
  • 15
  • 112
  • 169
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • I would also suggest to be a bit _lazy_ if possible. We can generate a new matrix just by defining a list of row numbers (`[2, 4, 1, 3, 0]` in the example) and going to the full-scale `np.array` if an assignment should be done, or to some sort of _history of changes_ (but I'm not sure if it's okay for `numpy` to work with dynamic sized arrays). – u354356007 Sep 01 '15 at 05:13
  • Dynamic `numpy` array probably won't work, it was sort of discussed before http://stackoverflow.com/questions/6950456/how-to-create-a-dynamic-array . I guess one probably goes for `Fortran` or `C` for dynamic array. But wait, that is no longer a *lazy* solution, :) – CT Zhu Sep 01 '15 at 05:30
  • 2
    What if the rows are say [6, 5, 6, 4, 6, 7, 4, 5, 4, 4] and the columns [ 3, 6, 5, 7, 2, 8, 3, 3, 4, 10] rather than constants? Even if you had one solution simply shuffling wouldn't always produce others. – Dan D. Sep 01 '15 at 06:23
0

Tried this, and it works

np.mod(np.random.permutation(N*N).reshape(N,N),2)

Example:

>>> np.mod(np.random.permutation(4*4).reshape(4,4),2)  
array([[0, 0, 0, 1],
      [1, 1, 1, 0],
      [1, 0, 0, 1],
      [0, 1, 1, 0]])
>>> np.mod(np.random.permutation(4*4).reshape(4,4),2)  
array([[0, 0, 0, 1],
      [1, 1, 0, 0],
      [1, 1, 1, 1],
      [0, 0, 1, 0]])
codenio
  • 721
  • 10
  • 17