-1

I want to check elements of an extremely long (over a billion elements) generator for a property. Obviously it is infeasible to check all the elements (that would take roughly 400 years). Currently, they are produced in an ordered fashion. In order for the small sample that I will have time to check to be more representative of the whole thing, I would like to access the generator randomly.

Is there any way to do this (as changing it to a list and doing random.shuffle is not possible)?

I'm trying to pick a random sample from the itertools.combinations result of a large input set:

itertools.combinations(a_large_set, 3)
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
rlms
  • 10,650
  • 8
  • 44
  • 61
  • 4
    Can you make the generator produce the items in a random order? What are you actually doing with the billion elements? If you give us some more info we might be able to provide more help. Another approach would be to take the first N elements from the generator, and select a random subset of M of those to be checked for the property (assuming that checking the property is much more expensive than generating the items in the first place). – Tom Dalton Jan 31 '15 at 19:41
  • @TomDalton I'm checking the elements to see if they have a certain characteristic. This is a much more expensive operation than anything else being done in the program. If all else fails, I will try to make the generator produce elements in a random order. – rlms Jan 31 '15 at 19:44
  • 1
    What is your generator? There is probably a *different* approach possible. – Martijn Pieters Jan 31 '15 at 19:44
  • @MartijnPieters It is `itertools.combinations(a_large_set, 3)`. I'm now pursuing a different approach based on the `random_combination` recipe found in the itertools documentation. I was just wondering if there was an easier way involving a small modification to existing code. – rlms Jan 31 '15 at 19:46
  • 2
    @sweeneyrod: then just pick `random.sample(a_long_list, 3)` a few times, until you have a unique set of elements to fit your target sample size. – Martijn Pieters Jan 31 '15 at 19:47

2 Answers2

1

Is there any way to do this

No.

L3viathan
  • 26,748
  • 2
  • 58
  • 81
1

You cannot skip ahead in a generator. There are ways to iterate and create valid random sample, but you'd have to put an upper limit on how many elements you'd iterate. It then would not represent a valid random selection from all possible values the generator could produce.

If you are producing combinations of 3 elements from a large list, then just pick samples of 3:

def random_combinations_sample(lst, element_count, sample_size):
    result = set()
    while len(result) < sample_size:
        indices = random.sample(xrange(len(lst)), element_count)
        sample = tuple(lst[i] for i in sorted(indices))
        result.add(sample)
    return list(result)

There is no need to produce all possible combinations if you only need a random set of combinations. Like itertools.combinations(), elements are picked in the order they appear in the input list.

Instead of:

random.sample(itertools.combinations(a_large_set, 3), 10)

you'd use

random_combinations_sample(a_large_set, 3, 10)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343