I made a random walk generator with python and numpy. Given an adjency matrix, I want to sample random paths from each node. To do so, I currently made this class, which takes as input the neighbors_nodes of each node of the graph with their corresponding probabilities:
import numpy as np
class RandomChoice(object):
def __init__(self, neighbors_choices, neighbors_prob, depth=50):
C = len(neighbors_choices)
self.depth = depth
self.neighbors_choices = neighbors_choices
self.neighbors_prob = neighbors_prob
self.index = np.zeros(C, np.uint32)
self.choices = list()
for i in range(C):
self.choices.append(np.random.choice(self.neighbors_choices[i], size=self.depth, p=self.neighbors_prob[i]))
def __getitem__(self, arg):
if self.index[arg] == self.depth:
self.choices[arg] = np.random.choice(self.neighbors_choices[arg], size=self.depth, p=self.neighbors_prob[arg])
self.index[arg] = 0
val = self.choices[arg][self.index[arg]]
self.index[arg] += 1
return val
And I use it like this:
# Example with a 3x3 matrix
# 1 2 1
# 3 0 1
# 0 1 0
number_of_walks_per_node = 5
number_of_nodes = 3
length_of_walks = 10
choices = [
[0, 1, 2],
[0, 2],
[1]
]
probs = [
[0.25, 0.5, 0.25],
[0.75, 0.25],
[1]
]
randomChoice = RandomChoice(choices, probs, depth=50)
for i in range(number_of_walks_per_node):
for starting_node in range(number_of_nodes):
walker_positions = [starting_node]
for j in range(length_of_walks):
walker_positions.append( randomChoice[walker_positions[j]])
print(walker_positions)
The idea here is to take profit of the vector-efficiency of numpy.random.choices against some space in RAM. But this function is still the bottleneck of this code. I think numpy.random.choices spends time checking that the probabilities sum to 1 and that each one of them is greater than 0. Do you have any idea how I could speed up this code ?