6

I'd like to create a random permutation of the numbers [1,2,...,N] where N is a big number. So I don't want to store all elements of the permutation in memory, but rather iterate over the elements of my particular permutation without holding former values in memory.

Any idea how to do that in Python?

Nikos M.
  • 8,033
  • 4
  • 36
  • 43
Gere
  • 12,075
  • 18
  • 62
  • 94
  • This might be what you are looking for: http://stackoverflow.com/questions/976882/shuffling-a-list-of-objects-in-python ? – Asking Questions Jun 03 '15 at 09:11
  • It does a permutation, but I specifically want to avoid storing data of size N in memory. – Gere Jun 03 '15 at 09:15
  • 2
    Bad news : you cant do it without storing data :D. you need to know wich number you did generate,unless you have a time travel machine :D. – yahya el fakir Jun 03 '15 at 09:15
  • 1
    possible duplicate http://stackoverflow.com/questions/28990820/iterator-to-produce-unique-random-order – Nikos M. Jun 03 '15 at 09:19
  • 5
    @yahyaelfakir, not true, it can be done, see related question and answers therein – Nikos M. Jun 03 '15 at 09:19
  • see [Abacus](https://github.com/foo123/Abacus) a combinatorics library for js/php/python (currently only js, php/python in progress), it can compute efficiently permutations etc using only the index and so on (ps i'm author) – Nikos M. Jun 03 '15 at 09:37
  • @NikosM. Hmm thanks for your information i'll try to look at this, it just was impossible for me *by intuition*. – yahya el fakir Jun 03 '15 at 09:37
  • @yahyaelfakir, the whole point is a way to re-encode the indices (assuming the iterator can fetch or construct an item by its index). If you see it as such even a random number generator `mod N`, will do (at least for some orderings) – Nikos M. Jun 03 '15 at 09:49
  • @NikosM.: It seems the duplicate question gives pretty good advice. So which `mod N` method would you recommend? Do both `A*k mod N` and `A^k mod N work`? Of course neither can give all permutations? Is any method more random? – Gere Jun 04 '15 at 09:19
  • @Gerenuk, no the `mod N` method will not return all possible orderings (mainly cyclic orderings, for some N where all orderings are ceyclic orderings it can generate all possible orderings), **but** one can use this approach sometimes then another approach e.g using a more general polynomial which will return more orderings and so on. – Nikos M. Jun 04 '15 at 11:07
  • @Gerenuk Eventually one will need a (at most) N-size vector to generate consistently all possible orderings, but the whole point is that this vector can occupy much less space than having N combinatorial objects in memory. So use whatever suits your use case, these are the options (per my analysis) – Nikos M. Jun 04 '15 at 11:09
  • @Gerenuk, uodated my analysis on that page with a rejection-based method which requires only N bits to be storted in memory (literally) for all cases, but has a longer worst-case time complexity (O(N) on average though) – Nikos M. Jun 04 '15 at 11:22

2 Answers2

6

One possibility is to use an encryption. Since encryption is reversible, i.e. one-to-one, for a given key you will get back the same numbers you encrypt but in a different order.

You need a block cypher with a block size large enough to include your maximum N. Use DES in ECB mode for N = 2^64 - 1. Use AES in ECB mode for N = 2^128 - 1. For other sizes, either use Hasty Pudding cipher, which has variable block size, or write your own simple Feistel cipher. I assume that you just need a shuffle, not a cryptographically secure shuffle.

If the output is greater than N, then just re-encrypt until it is less than N, the 1-to-1 property ensures that the chain of large numbers is also unique.

There is no need to store the entire array in memory, each number can be encrypted as needed. Just the key and the cipher algorithm are needed. One slight complication is that block ciphers work on [0 ... N-1]; you might need some extra code to deal with the extremes.

rossum
  • 15,344
  • 1
  • 24
  • 38
  • Seems like a working approach, but is there some simpler operation than standard encryption algorithms? I'd like to code it in Python myself. Also they are usually fixed to like 2^64 block sizes. It seems `a*n mod p` is a good start, though. Thanks for pointing out good encryption methods, which might come in handy! – Gere Jun 04 '15 at 09:12
  • 1
    You can write a very simple Feistel cipher for any even block size. It won't be cryptographically secure, but with four rounds it will be fast. Four rounds is the minimum to make it reasonable random. – rossum Jun 04 '15 at 10:47
0

This is a generic issue and rather not a Python-specific. In most languages, even when iterators are used for using structures, the whole structure is kept in memory. So, iterators are mainly used as "functional" tools and not as "memory-optimization" tools.

In python, a lot of people end up using a lot of memory due to having really big structures (dictionaries etc.). However, all the variables-objects of the program will be stored in memory in any way. The only solution is the serialization of the data (save in filesystem, Database etc.).

So, in your case, you could create a customized function that would create the list of the permutations. But, instead of adding each element of the permutation to a list, it would save the element either in a file (or in a database with the corresponding structure). Then, you would be able to retrieve one-by-one each permutation from the file (or the database), without bringing the whole list in memory.

However, as mentioned before, you will have to always know in which permutation you currently are. In order to avoid retrieving all the created permutations from Database (which would create the same bottleneck), you could have an index for each place holding the symbol used in the previously generated permutation (and create the permutations adding the symbols and a predefined sequence).

Dimos
  • 8,330
  • 1
  • 38
  • 37
  • 2
    Using an encryption, the encryption key acts as the index to the permutation. Each key has its own permutation. Change the key and you change the permutation. – rossum Jun 04 '15 at 15:37
  • yes, the encryption is a really good choice for the indexing. Thanks! – Dimos Jun 04 '15 at 21:11