4

I've stated this question in graph theory terms, but that conceptualization isn't necessary.

What I'm trying to do, using Python, is produce a matrix of zeros and ones, where every row has the same number of ones and every column has the same number of ones. The number for rows will not be the same as the number for columns when the number of rows (sending nodes) does not equal the number of columns (receiving nodes) -- which is something I'm allowing.

It makes sense to me to do this in numpy, but there may be other packages (like networkx?) that would help.

Here's the function I'm looking to write with the desired inputs and outputs:

n_pre = 4  # number of nodes available to send a connection
n_post = 4  # number of nodes available to receive a connection
p = 0.5  # proportion of all possible connections that exist

mat = generate_mat(n_pre, n_post, p)

print mat

The output would be, for example:

[[0, 1, 0, 1],
 [1, 0, 1, 0],
 [1, 1, 0, 0],
 [0, 0, 1, 1]]

Notice, every column and every row has two ones in it. Aside from this constraint, the positions of the ones should be random (and vary from call to call of this function).

In graph theory terms, this means every node has an in-degree of 2 and an out-degree of 2 (50% of all possible connections, as specified with p = 0.5).

abcd
  • 10,215
  • 15
  • 51
  • 85
  • I would not be surprised to see that uniform-sampling is NP-hard. If you are desperate and don't get something out of this question, you can adapt [this](https://stackoverflow.com/a/47208133/2320035) approach. – sascha Aug 14 '18 at 20:05
  • I still feel like we're confusing concepts here. I can grok n_pre and n_post as the cardinality of the sets U and V of a bipartite graph (sending and receiving), but p is still mysterious. The "proportion of all possible connections that exist" makes it sound like we're counting possible _edges_, but if we have n_pre != n_post, then we can't just multiply them both by p, because the total sum of the in-degrees has to equal the total sum of the out-degrees (every edge starts somewhere and goes somewhere.) – DSM Aug 14 '18 at 20:55
  • @DSM it all works out as far as i can tell . . . say n_pre = 80, n_post = 10, p = 0.2 . . . then each of the 10 receives 16 connections . . . each of the 80 sends out 2 connections . . . the total in-degree is 10 * 16 = 160, the total out-degree is 2 * 80 = 160 . . . and the proportion of all possible edges is 160 / 800 = 0.2. – abcd Aug 14 '18 at 22:38

2 Answers2

3

For a square matrix, what you describe is the adjacency matrix of a random k-regular directed graph, and there are known algorithms to generate such graphs. igraph implements one:

# I think this is how you call it - it's an instance method for some reason.
igraph.Graph().K_Regular(n, k, directed=True)

networkx has a function for random k-regular undirected graphs:

networkx.random_regular_graph(k, n)

For a non-square matrix, what you describe is isomorphic to a random biregular graph. I have found no convenient existing implementation for random biregular graphs, but the term should be a good starting point for searching for known algorithms.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I have a network with `G.number_of_nodes() = 37764`, `G.number_of_edges() = 518151`. and average degree `2*G.number_of_edges()/G.number_of_nodes() = 27.44153161741341`. Any suggestions how to create a regular graph having same nodes and edges? I am confused as the average degree is not an integer. What to do in this case? – thepunitsingh Dec 16 '20 at 09:16
  • @thepunitsingh: Not possible. It's like asking for a 3-sided square. – user2357112 Dec 16 '20 at 09:19
  • @thepunitsingh: Also, post new questions as questions, not as comments on unrelated questions. – user2357112 Dec 16 '20 at 09:20
  • Yes sure, I'll post it as a question. Just trying to get an idea first how things should be framed. – thepunitsingh Dec 16 '20 at 09:22
1

First, do the pre-work so that we have available the size of the square matrix and the population pop of each row and column. Now, initialize a matrix with pop ones on the diagonal. For n = 6 and pop = 3, you'd have

[[1, 1, 1, 0, 0, 0]
 [0, 1, 1, 1, 0, 0]
 [0, 0, 1, 1, 1, 0]
 [0, 0, 0, 1, 1, 1]
 [1, 0, 0, 0, 1, 1]
 [1, 1, 0, 0, 0, 1]]

Now, apply your friendly neighborhood random shuffle operation to the columns, then the rows (or in the other order). There's your matrix. A shuffle of rows-only or columns-only does not change the population on either axis.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • 2
    That can only generate a small fraction of all possible graphs matching the constraints, though. – user2357112 Aug 14 '18 at 20:21
  • Hmmm .... do you have a counter-example for 3x3 or 4x4, perhaps? I'm not seeing the restriction, but I can see where I might have missed an insight. – Prune Aug 14 '18 at 20:29
  • 1
    Never mind ... I see a problem. This will produce only one class in each N ==> N projection. – Prune Aug 14 '18 at 20:36