9

I'd like to generate matrices of size mxn and rank r, with elements coming from a specified finite set, e.g. {0,1} or {1,2,3,4,5}. I want them to be "random" in some very loose sense of that word, i.e. I want to get a variety of possible outputs from the algorithm with distribution vaguely similar to the distribution of all matrices over that set of elements with the specified rank.

In fact, I don't actually care that it has rank r, just that it's close to a matrix of rank r (measured by the Frobenius norm).

When the set at hand is the reals, I've been doing the following, which is perfectly adequate for my needs: generate matrices U of size mxr and V of nxr, with elements independently sampled from e.g. Normal(0, 2). Then U V' is an mxn matrix of rank r (well, <= r, but I think it's r with high probability).

If I just do that and then round to binary / 1-5, though, the rank increases.

It's also possible to get a lower-rank approximation to a matrix by doing an SVD and taking the first r singular values. Those values, though, won't lie in the desired set, and rounding them will again increase the rank.

This question is related, but accepted answer isn't "random," and the other answer suggests SVD, which doesn't work here as noted.

One possibility I've thought of is to make r linearly independent row or column vectors from the set and then get the rest of the matrix by linear combinations of those. I'm not really clear, though, either on how to get "random" linearly independent vectors, or how to combine them in a quasirandom way after that.

(Not that it's super-relevant, but I'm doing this in numpy.)


Update: I've tried the approach suggested by EMS in the comments, with this simple implementation:

real = np.dot(np.random.normal(0, 1, (10, 3)), np.random.normal(0, 1, (3, 10)))
bin = (real > .5).astype(int)
rank = np.linalg.matrix_rank(bin)
niter = 0

while rank > des_rank:
    cand_changes = np.zeros((21, 5))
    for n in range(20):
        i, j = random.randrange(5), random.randrange(5)
        v = 1 - bin[i,j]
        x = bin.copy()
        x[i, j] = v
        x_rank = np.linalg.matrix_rank(x)
        cand_changes[n,:] = (i, j, v, x_rank, max((rank + 1e-4) - x_rank, 0))
    cand_changes[-1,:] = (0, 0, bin[0,0], rank, 1e-4)

    cdf = np.cumsum(cand_changes[:,-1])
    cdf /= cdf[-1]
    i, j, v, rank, score = cand_changes[np.searchsorted(cdf, random.random()), :]
    bin[i, j] = v
    niter += 1
    if niter % 1000 == 0:
        print(niter, rank)

It works quickly for small matrices but falls apart for e.g. 10x10 -- it seems to get stuck at rank 6 or 7, at least for hundreds of thousands of iterations.

It seems like this might work better with a better (ie less-flat) objective function, but I don't know what that would be.


I've also tried a simple rejection method for building up the matrix:

def fill_matrix(m, n, r, vals):
    assert m >= r and n >= r
    trans = False
    if m > n: # more columns than rows I think is better
        m, n = n, m
        trans = True

    get_vec = lambda: np.array([random.choice(vals) for i in range(n)])

    vecs = []
    n_rejects = 0

    # fill in r linearly independent rows
    while len(vecs) < r:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            vecs.append(v)
        else:
            n_rejects += 1
    print("have {} independent ({} rejects)".format(r, n_rejects))

    # fill in the rest of the dependent rows
    while len(vecs) < m:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            n_rejects += 1
            if n_rejects % 1000 == 0:
                print(n_rejects)
        else:
            vecs.append(v)
    print("done ({} total rejects)".format(n_rejects))

    m = np.vstack(vecs)
    return m.T if trans else m

This works okay for e.g. 10x10 binary matrices with any rank, but not for 0-4 matrices or much larger binaries with lower rank. (For example, getting a 20x20 binary matrix of rank 15 took me 42,000 rejections; with 20x20 of rank 10, it took 1.2 million.)

This is clearly because the space spanned by the first r rows is too small a portion of the space I'm sampling from, e.g. {0,1}^10, in these cases.

We want the intersection of the span of the first r rows with the set of valid values. So we could try sampling from the span and looking for valid values, but since the span involves real-valued coefficients that's never going to find us valid vectors (even if we normalize so that e.g. the first component is in the valid set).

Maybe this can be formulated as an integer programming problem, or something?

Community
  • 1
  • 1
Danica
  • 28,423
  • 6
  • 90
  • 122
  • When your matrix is defined over finite sets, like `{0,1}` or `{1,2,3,4,5}` are you also using operations that are consistent with a vector space on top of those sets? If not, you're going to have problems. (e.g., you'd have to do arithmetic mod 5 on the elements `{0,1,2,3,4}` to get a valid group of matrices over that set. You can't use real-valued arithmetic and expect your random draws, linear combinations, etc., to be closed.) – ely Apr 12 '12 at 22:24
  • @EMS Sorry, forgot to mention that I'm using the normal real vector space. That's what I want: a matrix over R but whose elements come from these finite sets. If I did the linear combination thing, I'd obviously have to be careful to do it in such a way that the elements still lie in the set in question, but I want the rank w.r.t. the reals to be `r`. – Danica Apr 12 '12 at 22:25
  • I don't think you can mix-and-match the underlying field like that. If the definition of linear independence includes real-valued coefficients on the entries in the rows/columns of your matrix, then the span of some columns can be huge subsets of the matrices over R, not the matrices over your finite set. If you're treating the matrices like they must be over that finite set, then you have to treat the operations on the matrix entries in a way that's consistent with that finite set. You need the set `{0,1,2,3,4}` equipped with its own addition and multiplication. – ely Apr 12 '12 at 22:29
  • Clearly matrices over R with binary or 1-5 values of given ranks `r` exist; I need a way to generate them. I don't think I can find them for a given rank in e.g. `Z_5`, because that's not the same thing: `(1,2)` and `(2,1)` are linearly independent over the reals but dependent over `Z_3`, since `2*(1,2)=(2,1)`. – Danica Apr 12 '12 at 22:33
  • To make matters worse, thinking about the binary case won't help much, because it's purely a matter of lucky convenience that things like `[1, 0]` and `[0, 1]` are independent in R^2. For instance, if you're thinking about length-10 vectors from `{1,2,3,4,5}`, there will only be 5^{10} such vectors. There's no special reason why there couldn't exist two vectors inside that set whose span contains all the rest of the vectors, when the span is taken according to real-valued arithmetic. – ely Apr 12 '12 at 22:34
  • @EMS: If you make it over `{0,1,2,3,4}` (which is perfectly equivalent for my purposes), then the standard basis vectors `e_1`, ..., `e_10` are contained in the set of 5^10 such vectors, so you necessarily need 10 vectors to span it. Good point, though -- I should probably be using 0..4 instead of 1..5. – Danica Apr 12 '12 at 22:41
  • Sure, but that's because you include the zero element of the underlying field. If you don't, then it becomes an affine space and things change. So simply by moving to include 0, it changes the whole complexion of the subspace drastically. – ely Apr 12 '12 at 22:43
  • I think progress can be made on this, but this is by no means an easy problem. It reminds me of the work done on sampling binary matrices with fixed row and column sums, something about which highly regarded research papers are still being published. I'm thinking about what kinds of multiples of a vector from your finite set can possibly produce other vectors in the finite set. It depends non-trivially on whether the elements include 0 or not, whether they are all divisible by common numbers also in the finite set, etc. I doubt that an easy, quick solution to this already exists. – ely Apr 12 '12 at 22:45
  • 1
    Actually, probably a really good way to try to sample this is with the Metropolis-Hastings algorithm. Make a random initial guess matrix with entries only from your finite set, and that at each iteration, select a random place to change (swapping its entry for another one from the finite set). Compute some measure of the change that 'measures' linear independence, and accept the change proportionally to that measure. You'll reject often, but after a lot of iterations, you'll have something extremely close to what you want. – ely Apr 12 '12 at 22:49
  • Obviously, that might need some modifications. But I believe that's the idea behind the best modern algorithms for sampling from the binary fixed-row-sum matrices as well. – ely Apr 12 '12 at 22:49
  • @EMS Any suggestions for a good measure of linear dependence? I'm trying a simple implementation scoring with `max(old_rank - new_rank + 1e-4, 0)`, but the objective function is too flat and e.g. 10x10 binary matrices always seem to get stuck around rank 6 or 7 for tens/hundreds of thousands of iterations (evaluating 20 options per iteration). – Danica Apr 12 '12 at 23:29
  • Probably you want something like `exp( -1.0*abs( current_rank - desired_rank ) )` but it might require some testing. – ely Apr 13 '12 at 04:36
  • @EMS I don't think so...the problem is that rank is too flat, i.e. after a certain point all the options always have the same rank, and I either need some kind of measure that will recognize steps towards a lower rank matrix, or bigger, smarter steps than permuting one element randomly. – Danica Apr 13 '12 at 04:52
  • You can try making proposals that change a bunch of entries, and have the number of entries change as the score function gets better. It would be a form of the reheating as a function of cost idea from simulated annealing. But in the end, this is a combinatorial optimization problem, and they do usually involve miserably flat/slow score landscapes. Think about finding optimal configurations of atoms, or solving traveling salesman globally, with Metropolis. It takes a loooong time, but in many cases no one knows a better sampler yet. Time to move to a GPU maybe to do many more proposals. – ely Apr 13 '12 at 05:01
  • By the way, I should also say that this is by far one of the most useful stackoverflow questions I've ever seen and it's tremendously good that you asked it and are sharing your progress with people. – ely Apr 13 '12 at 05:02
  • I would be very careful about using a rejection method. Once an acceptance becomes a rare event (what this means exactly is vague, but orders of magnitude more rejections than acceptances) there is the general rule that 'rare events will occur in the least likely of all unlikely ways'. This means that each sample may be very similar. Maybe acceptance only happen if the rank is much smaller than r, or something similar. I suppose it depends on the 'randomness' you are looking for. Methods like Metropolis-Hastings should handle this phenomenon appropriately whereas an ad hoc method may not. – Daniel Johnson Apr 13 '12 at 05:24
  • Also, a nice way to characterize the distribution you want to sample from is such that every matrix satisfying certain conditions (all entries are in a given set and with rank r for example) are given equal likelihood: the uniform distribution on that class of matrices. As discussed above, this class is not easily describable. It is easy to sample matrices with entries from a given set and it is easy to sample matrices with a given rank, but the intersection of these two classes is small. – Daniel Johnson Apr 13 '12 at 05:41

3 Answers3

2

My friend, Daniel Johnson who commented above, came up with an idea but I see he never posted it. It's not very fleshed-out, but you might be able to adapt it.

If A is m-by-r and B is r-by-n and both have rank r then AB has rank r. Now, we just have to pick A and B such that AB has values only in the given set. The simplest case is S = {0,1,2,...,j}.

One choice would be to make A binary with appropriate row/col sums that guaranteed the correct rank and B with column sums adding to no more than j (so that each term in the product is in S) and row sums picked to cause rank r (or at least encourage it as rejection can be used).

I just think that we can come up with two independent sampling schemes on A and B that are less complicated and quicker than trying to attack the whole matrix at once. Unfortunately, all my matrix sampling code is on the other computer. I know it generalized easily to allowing entries in a bigger set than {0,1} (i.e. S), but I can't remember how the computation scaled with m*n.

ely
  • 74,674
  • 34
  • 147
  • 228
2

I am not sure how useful this solution will be, but you can construct a matrix that will allow you to search for the solution on another matrix with only 0 and 1 as entries. If you search randomly on the binary matrix, it is equivalent to randomly modifying the elements of the final matrix, but it is possible to come up with some rules to do better than a random search.

If you want to generate an m-by-n matrix over the element set E with elements ei, 0<=i<k, you start off with the m-by-k*m matrix, A:

Generator matrix

Clearly, this matrix has rank m. Now, you can construct another matrix, B, that has 1s at certain locations to pick the elements from the set E. The structure of this matrix is:

Selector matrix

Each Bi is a k-by-n matrix. So, the size of AB is m-by-n and rank(AB) is min(m, rank(B)). If we want the output matrix to have only elements from our set, E, then each column of Bi has to have exactly one element set to 1, and the rest set to 0.

If you want to search for a certain rank on B randomly, you need to start off with a valid B with max rank, and rotate a random column j of a random Bi by a random amount. This is equivalent to changing column i row j of A*B to a random element from our set, so it is not a very useful method.

However, you can do certain tricks with the matrices. For example, if k is 2, and there are no overlaps on first rows of B0 and B1, you can generate a linearly dependent row by adding the first rows of these two sub-matrices. The second row will also be linearly dependent on rows of these two matrices. I am not sure if this will easily generalize to k larger than 2, but I am sure there will be other tricks you can employ.

For example, one simple method to generate at most rank k (when m is k+1) is to get a random valid B0, keep rotating all rows of this matrix up to get B1 to Bm-2, set first row of Bm-1 to all 1, and the remaining rows to all 0. The rank cannot be less than k (assuming n > k), because B_0 columns have exactly 1 nonzero element. The remaining rows of the matrices are all linear combinations (in fact exact copies for almost all submatrices) of these rows. The first row of the last submatrix is the sum of all rows of the first submatrix, and the remaining rows of it are all zeros. For larger values of m, you can use permutations of rows of B0 instead of simple rotation.

Once you generate one matrix that satisfies the rank constraint, you may get away with randomly shuffling the rows and columns of it to generate others.

vhallac
  • 13,301
  • 3
  • 25
  • 36
2

How about like this?

rank = 30
n1 = 100; n2 = 100
from sklearn.decomposition import NMF
model = NMF(n_components=rank, init='random', random_state=0)
U = model.fit_transform(np.random.randint(1, 5, size=(n1, n2)))
V = model.components_
M = np.around(U) @ np.around(V)
diadochos
  • 242
  • 1
  • 2
  • 10
  • This is an interesting approach, but (a) it gives a bunch of elements that are too large (for one run `np.bincount(M.astype(int).ravel())` gives `[1496, 3057, 2973, 1720, 597, 134, 21, 2]`, where that last number means there are two `7`s), and (b) the rank ended up being 29 rather than 30. – Danica Mar 29 '18 at 17:18
  • (In any case, I'm a good six years past the project where I wanted this. But still.) – Danica Mar 29 '18 at 17:19
  • Mmm.. I see. Thanks for the feedback. – diadochos Apr 03 '18 at 11:31