I am experimenting with unrolling a few nested loops for (potentially) better performance at the expense of memory. In my scenario, I would end up with a list of about 300M elements (tuples), which I'd have to yield in (more or less) random order.
At this order of magnitude, random.shuffle(some_list)
really is not the way to go anymore.
The below example illustrates the issue. Be aware, on an x86_64 Linux and CPython 3.6.4, it will eat about 11 GByte of memory.
def get_random_element():
some_long_list = list(range(0, 300000000))
for random_item in some_long_list:
yield random_item
My thinking so far is to simply generate one random index per iteration and yield randomly picked elements (indefinitely) from the list. It may yield certain elements multiple times and totally skip others, which would be a trade-off worth considering.
What other options do I have within reasonable bounds of memory and CPU time to possibly yield every element of the list only once?