My task is to:
- generate a list of unique combinations where each combination has a certain length (var com_len) and contains values (ints) from the given list (var values),
- each combination is created by taking random values from the given list (randomness is very important!),
- each combination must have unique values inside, no value can be repeated inside the combination,
- values in the combination must be sorted,
- appearance of each value within the whole combinations' set is counted (var counter),
- each value must appear in the whole dataset as close to the given number of times as possible (var counter_expected). "as close as possible" means, count each appearing value as the script goes and if there are no more combinations left to create, just end the script.
For example, I need to generate a list of combinations where each combination has a length of 3, has unique sorted values inside, each value is from the range(256) and each value appears in all of the combinations generated so far as close to 100 times as possible.
My problem is, how efficiently detect there are no more unique combinations of the values that can be created to stop the loop.
The problem appears when the script is going to an end and there still are available values left and len(available_values) > com_len but it isn't possible to create any new unique combinations that haven't appeared yet.
The code created so far:
import numpy as np
import random
com_len = 3
length = 256
counter = np.zeros(length)
values = range(length)
exclude_values = []
counter_expected = 100
done = []
mask = np.ones(len(np.array(values)), np.bool)
mask[exclude_values] = False
available_values = set(values) - set(exclude_values)
available_values = list(available_values)
ii = 0
while True:
"""print progress"""
ii = ii + 1
if not ii % 1000: print('.', end='')
#POSSIBLE CONDITION HERE
ex = random.sample(available_values, k=com_len)
ex.sort()
if ex in done: continue
done.append(ex)
counter[ex] = counter[ex] + 1
for l_ in ex:
if counter[l_] == counter_expected:
del available_values[available_values.index(l_)]
if len(available_values) < com_len:break
if all(counter[mask] == counter_expected): break
#OR HERE
NOTE: The script usually ends successfully because either len(available_values) < com_len or all(counter[mask] == counter_expected) condition breaks the While loop. Try running the script several times and at some point, you'll observe the script going into an infinite loop as len(available_values) >= com_len but there are no more new unique conditions available to be created so the counter is not increasing.
I need an efficient condition to stop the script. Using itertools.combinations here is not an option because available_values list may be long, i.e. 10k elements, at the beginning.
The brute-force would be using itertools.combinations when len(available_values) reaches a certain level and checking if there are any combinations that haven't been created yet but it's an ugly solution.
There might be a better way which doesn't come up to me but may to you. I'll be grateful for your help.