0

I use two digits in an integer to represent one element. I.e. 3345512 represents the four elements [3,34,55,12]. Next, I repeatedly add one to the integer to get another sequence of elements. When generating the sequences like this, I get permutations of the same sequence, i.e. 3341255 = [3,34,12,55], which in my case is equivalent to the number 3345512 = [3,34,55,12]. So I would like to avoid permutations of a sequence I already encountered. I do not want to store the numbers as they grow very large (10^30 and more). I tried to use a bloom filter, but it could not handle the amount of elements. Is there a trivial solution to generate the sequences without permutations?

[EDIT] Here is a tiny python script, that should work. For the sake of better comprehensibility, I use a single digit with if s[idx] == 9: instead of if s[idx] == 99: If you have a more simple solution I will accept it as an answer.

import time
s = [1]
while True:
    idx = 0
    while not idx+1 == len(s) and not s[idx] < s[idx+1]:
        s[idx] = 1
        idx += 1
    if s[idx] == 9:
        s[idx] = 1
        s.append(1)
    else:
        s[idx] += 1
    print repr(s)
    time.sleep(0.7)
phobic
  • 914
  • 10
  • 24
  • 3
    If you want to avoid repeating sequences, which are permutations of each other, choose only one of the permutations as representative, e.g. the permutation where elements form non-decreasing sequence. – chill Dec 11 '13 at 14:30
  • 2
    Chill's idea is good if you start counting from 0. Otherwise, you may miss some sequences. For example, if you start counting from `5050`, you would skip over `5149` even though you never encountered `4951` in the first place. – Kevin Dec 11 '13 at 15:13
  • You want [combinations](http://en.wikipedia.org/wiki/Combinations) rather than permutations. – Jim Mischel Dec 11 '13 at 16:25
  • @Jim: The "counting multisubsets" section seems fitting, though there is no suggestion on how to implement it. – phobic Dec 11 '13 at 18:06

2 Answers2

2

What you're asking for is combinations rather than permutations. It looks like you're trying to get unique combinations of 4 items from a list of 100. That is, your possible values are 00 through 99.

You could generate all of the combinations with a nested loop, like this:

for (i = 0 to 96)
    for (j = i + 1 to 97)
        for (k = j + 1 to 98)
            for (l = k + 1 to 99)
                write i, j, k, l

That guarantees that you won't get the same combination.

You'll also note that in the generated combinations:

i < j < k < l

If you make your sequences always satisfy that inequality, then you won't get a duplicate combination.

So your example 3345512 would never be generated. It would be 3123455.

Given that, when you increment from 3123499, you don't go to 3123500, but rather to 3123536. Basically, if you overflow one position, you increment the next position, and the original position becomes (next position + 1). So 3999999 increments to 4050607.

Obviously, you can't do this with a simple integer increment. I would suggest using a 4-byte value and a little bit of logic.

Here's another way to do it.

Imagine your alphabet is only 4 characters, [0, 1, 2, 3]. There is a total of 16 possible unique combinations. You want the unique combinations of two items.

Now, consider a 4-bit number that can hold the values 0 through 15. You can map that number to the unique combinations so that the number 3 (0011 binary) corresponds to the combination 0, 1. That is, bit 0 is set and bit 1 is set. It should be clear that a number with 2 bits set corresponds to a unique 2-character combination. In the case of our 2-combinations in the alphabet of 4 items, we have:

0, 1  (0011)
0, 2  (0101)
0, 3  (1001)
1, 2  (0110)
1, 3  (1010)
2, 3  (1100)

With this, you can increment an integer, see if it has two bits set, and then translate from set bits to characters in your alphabet.

You can do exactly the same thing with selecting 4-combinations from a set of 100 characters. Working with a 100-bit number is a bit difficult but not impossible. Since you're just incrementing, you can do it with a pair of 64 bit numbers.

Determining how many bits are set is easy to do the naive way. The Bit Twiddling Hacks page shows several faster ways.

There are several ways to solve this problem, all of which can be parameterized to select unique k-combinations from a list of n items.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • I would not like to hard code the problem like this, as I do not know how many loops I need. Though this looks like a very simple and good solution. – phobic Dec 11 '13 at 18:11
  • @phobic: The loops were just showing how you could generate combinations of four. It's pretty easy to come up with a generic method that will generate combinations of n items from a list of m. – Jim Mischel Dec 11 '13 at 19:17
  • If you can come up with something more readable than my script, I will accept your answer. Maybe I can use recursion and generators to simulate endlessly nested for loops. I can't think of anything else. – phobic Dec 11 '13 at 19:33
  • @phobic: I *think* the script you posted is similar to what I had in mind. I'm not a Python programmer, so I won't try to write my code in Python. I described my algorithm above when I discussed how to increment. – Jim Mischel Dec 11 '13 at 20:08
  • @phobic Here is an algorithm i wrote in c++ that uses this exact method to solve this exact problem. Maybe it will be of some use. [Here is a link to my explanation of the algorithm](http://stackoverflow.com/questions/27308438/how-to-cheaply-calculate-all-possible-length-r-combinations-of-n-possible-elem/27308439#27308439) – android927 Feb 03 '15 at 20:08
0

To check if the given sequence has already been generated and can be skipped you should generate "previous" permutation of the current sequence that would be generated if counting from zero, and compare it with the starting number.

You can find this permutation the following way:

If the numbers are in the increasing order - for example [3] [4] [5] [6] - you can stop, as there is no smaller permutation of those numbers.

If the sequence has some non-increasing part - for example [3] [5] [4] [8] [6] [7] [8] [9]. This sequence has two non-increasing parts. 5-4 and 8-6.

Locate the last number which is bigger than it's successor. It's the first 8.

[3] [5] [4] [8] [6] [7] [8] [9]

Find the biggest number behind the 8, that's actually smaller. It's 7.

[3] [5] [4] [8] [6] [7] [8] [9]

Move 7 to the position of 8, and leave all other numbers in decreasing order.

[3] [5] [4] [7] [9] [8] [8] [6]

Now if this sequence is smaller than the starting number - you know that it was not yet generated. Otherwise you can skip it.

Kuba
  • 2,986
  • 2
  • 26
  • 32
  • Sorry, I could not quite follow. Could you state explicitly what the purpose of the reordering is, and how you know your algorithm does achieve it? Maybe you could simply sort the sequence instead of using your algorithm to get the "smallest" permutation and compare it to the current sequence? – phobic Dec 11 '13 at 18:20