I want to build a numpy
array from the result of itertools.product
. My first approach was a simple:
from itertools import product
import numpy as np
max_init = 6
init_values = range(1, max_init + 1)
repetitions = 12
result = np.array(list(product(init_values, repeat=repetitions)))
This code works well for "small" repetitions
(like <=4), but with "large" values (>= 12) it completely hogs the memory and crashes. I assumed that building the list was the thing eating all the RAM, so I searched how to make it directly with an array. I found Numpy equivalent of itertools.product and Using numpy to build an array of all combinations of two arrays.
So, I tested the following alternatives:
Alternative #1:
results = np.empty((max_init**repetitions, repetitions))
for i, row in enumerate(product(init_values, repeat=repetitions)):
result[:][i] = row
Alternative #2:
init_values_args = [init_values] * repetitions
results = np.array(np.meshgrid(*init_values_args)).T.reshape(-1, repetitions)
Alternative #3:
results = np.indices([sides] * num_dice).reshape(num_dice, -1).T + 1
#1 is extremely slow. I didn't have enough patience to let it finish (after a few minutes of processing on a 2017 MacBook Pro). #2 and #3 eat all the memory until the python interpreter crashes, as with the initial approach.
After that, I thought that I could express the same information in a different way that was still useful for me: a dict
where the keys would be all the possible (sorted) combinations, and the values would be the counting of these combinations. So I tried:
Alternative #4:
from collections import Counter
def sorted_product(iterable, repeat=1):
for el in product(iterable, repeat=repeat):
yield tuple(sorted(el))
def count_product(repetitions=1, max_init=6):
init_values = range(1, max_init + 1)
sp = sorted_product(init_values, repeat=repetitions)
counted_sp = Counter(sp)
return np.array(list(counted_sp.values())), \
np.array(list(counted_sp.keys()))
cnt, values = count(repetitions=repetitions, max_init=max_init)
But the line counted_sp = Counter(sp)
, which triggers getting all the values of the generators, is also too slow for my needs (it also took several minutes before I canceled it).
Is there another way to generate the same data (or a different data structure containing the same information) that does not have the mentioned shortcomings of being too slow or using too much memory?
PS: I tested all the implementations above against my tests with a small repetitions
, and all the tests passed, so they give consistent results.
I hope that editing the question is the best way to expand it. Otherwise, let me know, and I'll edit post where I should.
After reading the first two answers below and thinking about it, I agree that I am approaching the issue from the wrong angle. Instead of going with a "brute force" approach I should have used probabilities and work with that.
My intention is, later on, for each combination: - Count how many values are under a threshold X. - Count how many values are equal or over threshold X and below a threshold Y. - Count how many values are over threshold Y. And group the combinations that have the same counts.
As an illustrative example: If I roll 12 dice of 6 sides, what's the probability of having M dice with a value <3, N dice with a value >=3 and <4, and P dice with a value >5, for all possible combinations of M, N, and P?
So, I think that I'll close this question in a few days while I go with this new approach. Thank you for all the feedback and your time!