Best way to hash ordered permutation of [1,9]

Question

I'm trying to implement a method to keep the visited states of the 8 puzzle from generating again.
My initial approach was to save each visited pattern in a list and do a linear check each time the algorithm wants to generate a child.
Now I want to do this in O(1) time through list access. Each pattern in 8 puzzle is an ordered permutation of numbers between 1 to 9 (9 being the blank block), for example 125346987 is:

1 2 5
3 4 6
_ 8 7

The number of all of the possible permutation of this kind is around 363,000 (9!). what is the best way to hash these numbers to indexes of a list of that size?

why not just use the number as hash? 125346987 in your example — Elisha, Dec 29 '15 at 12:16
`''.join(permutations)`, or `int(''.join(permutations))`, you should try out which one is faster — titus, Dec 29 '15 at 12:21
@Elisha: I want to use the hash number to access a list, I can't use that number as the hashed value. I need a function to map this number to a number between zero to 363000. — Amen, Dec 29 '15 at 13:00
You might describe _in the question_ what you find lacking from the suggestions of titus and Caridorc: is it _really_ crucial that no single "child generation" exceeds `O(1) time`, or would it suffice to argue `O(1)` [amortised time](https://en.wikipedia.org/wiki/Amortized_time), for some significant number of operations to amortise over? — greybeard, Jan 01 '16 at 09:35
Actually what is lacking from the answer of Paul Hankin is a solid documentation. And about using python set, I DID mention in the question I want a hash function. — Amen, Jan 01 '16 at 10:06

Paul Hankin · Answer 1 · 2016-01-07T12:17:22.843

7

You can map a permutation of N items to its index in the list of all permutations of N items (ordered lexicographically).

Here's some code that does this, and a demonstration that it produces indexes 0 to 23 once each for all permutations of a 4-letter sequence.

import itertools

def fact(n):
    r = 1
    for i in xrange(n):
        r *= i + 1
    return r

def I(perm):
    if len(perm) == 1:
        return 0
    return sum(p < perm[0] for p in perm) * fact(len(perm) - 1) + I(perm[1:])

for p in itertools.permutations('abcd'):
    print p, I(p)

The best way to understand the code is to prove its correctness. For an array of length n, there's (n-1)! permutations with the smallest element of the array appearing first, (n-1)! permutations with the second smallest element appearing first, and so on.

So, to find the index of a given permutation, see count how many items are smaller than the first thing in the permutation and multiply that by (n-1)!. Then recursively add the index of the remainder of the permutation, considered as a permutation of (n-1) elements. The base case is when you have a permutation of length 1. Obviously there's only one such permutation, so its index is 0.

A worked example: [1324].

[1324]: 1 appears first, and that's the smallest element in the array, so that gives 0 * (3!)
Removing 1 gives us [324]. The first element is 3. There's one element that's smaller, so that gives us 1 * (2!).
Removing 3 gives us [24]. The first element is 2. That's the smallest element remaining, so that gives us 0 * (1!).
Removing 2 gives us [4]. There's only one element, so we use the base case and get 0.

Adding up, we get 0*3! + 1*2! + 0*1! + 0 = 1*2! = 2. So [1324] is at index 2 in the sorted list of 4 permutations. That's correct, because at index 0 is [1234], index 1 is [1243], and the lexicographically next permutation is our [1324].

edited Jan 07 '16 at 12:17

answered Dec 29 '15 at 13:23

Paul Hankin

54,811
11
92
118

Thank you for your answer, would you kindly provide a good documentation for this method? – Amen Dec 29 '15 at 13:58
Perhaps you can convince yourself that the function is correct by using induction over the length of the permutation. – Paul Hankin Dec 29 '15 at 14:23
It's correct alright :) but I want a solid documentation or perhaps a precise name for it! – Amen Dec 29 '15 at 15:19
2

@Amen [ranking/unranking](https://books.google.com/books?id=7XUSn0IKQEgC&pg=PA449&lpg=PA449&dq=ranking+unranking&source=bl&ots=z8aDQXGP3j&sig=FSnL9rrgSHFllFZrNgB_7pqxDT0&hl=en&sa=X&ved=0ahUKEwjFyIWF4InKAhUGbR4KHQ7VD2M4ChDoAQgbMAA#v=onepage&q=ranking%20unranking&f=false) – David Eisenstat Jan 01 '16 at 23:19
@DavidEisenstat where does `sum(p < perm[0] for p in perm)` come from? In the reference it seems that the algorithm is `(perm[0]-1)` (though this doesn't work and your version does, but I'm quite curious, I wouldn't have imagined there was a clean hashing function). – Carlos Navarro Astiasarán Jan 07 '16 at 10:48
@CarlosNavarro if you look at the page that David links to, you'll see a sentence about "relabeling the elements". The code in my answer `(p – Paul Hankin Jan 07 '16 at 12:01
@CarlosNavarro It's a Python hack. Correct me if I'm wrong: `p < perm[0] for p in perm` generates a list of True/False. True/False in Python is equivalent to 1/0. So the sum of 1's and 0's. Semantically, count of all values in the `perm` less that `perm[0]`. – SGM1 Jan 07 '16 at 19:13

abastion · Answer 2 · 2016-01-05T21:18:31.790

I believe you're asking for a function to map permutations to array indices. This dictionary maps all permutations of numbers 1-9 to values from 0 to 9!-1.

import itertools
index = itertools.count(0)
permutations = itertools.permutations(range(1, 10))

hashes = {h:next(index) for h in permutations}

For example, hashes[(1,2,5,3,4,6,9,8,7)] gives a value of 1445.

If you need them in strings instead of tuples, use:

permutations = [''.join(x) for x in itertools.permutations('123456789')]

or as integers:

permutations = [int(''.join(x)) for x in itertools.permutations('123456789')]

score 1 · Answer 3 · answered Dec 29 '15 at 12:27

1

It looks like you are only interested in whether or not you have already visited the permutation.

You should use a set. It grants the O(1) look-up you are interested in.

answered Dec 29 '15 at 12:27

Caridorc

6,222
2
31
46

Thank you for your answer, this will surely improve my initial approach but still isn't O(1) on worst case. – Amen Dec 29 '15 at 13:00

Carlos Navarro Astiasarán · Answer 4 · 2016-01-07T09:13:23.853

Use

I've developed a heuristic function for this specific case. It is not a perfect hashing, as the mapping is not between [0,9!-1] but between [1,767359], but it is O(1).

Let's assume we already have a file / reserved memory / whatever with 767359 bits set to 0 (e.g., mem = [False] * 767359). Let a 8puzzle pattern be mapped to a python string (e.g., '125346987'). Then, the hash function is determined by:

def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
    k = opts.index(int(c))
    opts.pop(k)
    data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output

I.e., in order to check if a 8puzzle pattern = 125346987 has already been used, we need to:

pattern = '125346987'
pos = getPosition(pattern)
used = mem[pos-1] #mem starts in 0, getPosition in 1.

With a perfect hashing we would have needed 9! bits to store the booleans. In this case we need 2x more (767359/9! = 2.11), but recall that it is not even 1Mb (barely 100KB).

Note that the function is easily invertible.

Check

I could prove you mathematically why this works and why there won't be any collision, but since this is a programming forum let's just run it for every possible permutation and check that all the hash values (positions) are indeed different:

def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
    k = opts.index(int(c))
    opts.pop(k)
    data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output


#CHECKING PURPOSES
def addperm(x,l):
    return [ l[0:i] + [x] + l[i:]  for i in range(len(l)+1) ]

def perm(l):
    if len(l) == 0:
        return [[]]
    return [x for y in perm(l[1:]) for x in addperm(l[0],y) ]

#We generate all the permutations
all_perms = perm([ i for i in range(1,10)])
print "Number of all possible perms.: "+str(len(all_perms)) #indeed 9! = 362880

#We execute our hash function over all the perms and store the output.
all_positions = [];
for permutation in all_perms:
    perm_string = ''.join(map(str,permutation))
    all_positions.append(getPosition(perm_string))

#We wan't to check if there has been any collision, i.e., if there
#is one position that is repeated at least twice.
print "Number of different hashes: "+str(len(set(all_positions))) 
#also 9!, so the hash works properly.

How does it work?

The idea behind this has to do with a tree: at the beginning it has 9 branches going to 9 nodes, each corresponding to a digit. From each of these nodes we have 8 branches going to 8 nodes, each corresponding to a digit except its parent, then 7, and so on.

We first store the first digit of our input string in a separate variable and pop it out from our 'node' list, because we have already taken the branch corresponding to the first digit.

Then we have 8 branches, we choose the one corresponding with our second digit. Note that, since there are 8 branches, we need 3 bits to store the index of our chosen branch and the maximum value it can take is 111 for the 8th branch (we map branch 1-8 to binary 000-111). Once we have chosen and store the branch index, we pop that value out, so that the next node list doesn't include again this digit.

We proceed in the same way for branches 7, 6 and 5. Note that when we have 7 branches we still need 3 bits, though the maximum value will be 110. When we have 5 branches, the index will be at most binary 100.

Then we get to 4 branches and we notice that this can be stored just with 2 bits, same for 3 branches. For 2 branches we will just need 1bit, and for the last branch we don't need any bit: there will be just one branch pointing to the last digit, which will be the remaining from our 1-9 original list.

So, what we have so far: the first digit stored in a separated variable and a list of 7 indexes representing branches. The first 4 indexes can be represented with 3bits, the following 2 indexes can be represented with 2bits and the last index with 1bit.

The idea is to concatenate all this indexes in their bit form to create a larger number. Since we have 17bits, this number will be at most 2^17=131072. Now we just add the first digit we had stored to the end of that number (at most this digit will be 9) and we have that the biggest number we can create is 1310729.

But we can do better: recall that when we had 5 branches we needed 3 bits, though the maximum value was binary 100. What if we arrange our bits so that those with more 0s come first? If so, in the worst case scenario our final bit number will be the concatenation of:

100 10 101 110 111 11 1

Which in decimal is 76735. Then we proceed as before (adding the 9 at the end) and we get that our biggest possible generated number is 767359, which is the ammount of bits we need and corresponds to input string 987654321, while the lowest possible number is 1 which corresponds to input string 123456789.

Just to finish: one might wonder why have we stored the first digit in a separate variable and added it at the end. The reason is that if we had kept it then the number of branches at the beginning would have been 9, so for storing the first index (1-9) we would have needed 4 bits (0000 to 1000). which would have make our mapping much less efficient, as in that case the biggest possible number (and therefore the amount of memory needed) would have been

1000 100 10 101 110 111 11 1

which is 1125311 in decimal (1.13Mb vs 768Kb). It is quite interesting to see that the ratio 1.13M/0.768K = 1.47 has something to do with the ratio of the four bits compared to just adding a decimal value (2^4/10 = 1.6) which makes a lot of sense (the difference is due to the fact that with the first approach we are not fully using the 4 bits).

score 1 · Answer 5 · answered Jan 07 '16 at 08:50

A space as well lookup efficient structure for this problem is a trie type structure, as it will use common space for lexicographical matches in any permutation. i.e. the space used for "123" in 1234, and in 1235 will be the same.

Lets assume 0 as replacement for '_' in your example for simplicity.

Storing

Your trie will be a tree of booleans, the root node will be an empty node, and then each node will contain 9 children with a boolean flag set to false, the 9 children specify digits 0 to 8 and _ .
You can create the trie on the go, as you encounter a permutation, and store the encountered digits as boolean in the trie by setting the bool as true.

Lookup

The trie is traversed from root to children based on digits of the permutation, and if the nodes have been marked as true, that means the permutation has occured before. The complexity of lookup is just 9 node hops.

Here is how the trie would look for a 4 digit example :

Python trie This trie can be easily stored in a list of booleans, say myList. Where myList[0] is the root, as explained in the concept here :

https://webdocs.cs.ualberta.ca/~holte/T26/tree-as-array.html

The final trie in a list would be around 9+9^2+9^3....9^8 bits i.e. less than 10 MB for all lookups.

∑(i=1 to 8) 9^i = 9 · (9⁸ - 1)/8 (geometric sequence sum). But you need only 9⁸ list items, because only the values of the leaves of this trie matter (you can use the implicit trie storage as just a method of indexing the list). While 9⁸ = 9·9·9·9·9·9·9·9 > 9·8·7·6·5·4·3·2 = 9! and therefore is less space-efficient than using permutation rank, the indexing obviously takes nine simple O(1) steps and needs no non-trivial algorithms or data structures. — Palec, Jan 07 '16 at 10:25

score 0 · Answer 6 · answered Jan 02 '16 at 19:31

Notice if you type hash(125346987) it returns 125346987. That is for a reason, because there is no point in hashing an integer to anything other than an integer.

What you should do, is when you find a pattern add it to a dictionary rather than a list. This will provide the fast lookup you need rather than traversing the list like you are doing now.

So say you find the pattern 125346987 you can do:

foundPatterns = {}
#some code to find the pattern
foundPatterns[1] = 125346987
#more code
#test if there?
125346987 in foundPatterns.values()
True

score 0 · Answer 7 · answered Jan 02 '16 at 23:19

If you must always have O(1), then seems like a bit array would do the job. You'd only need to store 363,000 elements, which seems doable. Though note that in practice it's not always faster. Simplest implementation looks like:

Create data structure

visited_bitset = [False for _ in xrange(373000)]

Test current state and add if not visited yet

if !visited[current_state]:
    visited_bitset[current_state] = True

score 0 · Answer 8 · edited May 23 '17 at 12:07

Paul's answer might work.

Elisha's answer is perfectly valid hash function that would guarantee that no collision happen in the hash function. The 9! would be a pure minimum for a guaranteed no collision hash function, but (unless someone corrects me, Paul probably has) I don't believe there exists a function to map each board to a value in the domain [0, 9!], let alone a hash function that is nothing more that O(1).

If you have a 1GB of memory to support a Boolean array of 864197532 (aka 987654321-12346789) indices. You guarantee (computationally) the O(1) requirement.

Practically (meaning when you run in a real system) speaking this isn't going to be cache friendly but on paper this solution will definitely work. Even if an perfect function did exist, doubt it too would be cache friendly either.

Using prebuilts like set or hashmap (sorry I haven't programmed Python in a while, so don't remember the datatype) must have an amortized 0(1). But using one of these with a suboptimal hash function like n % RANDOM_PRIME_NUM_GREATER_THAN_100000 might give the best solution.

Do you agree that permutations of {1…9} represent all possible board states (even the ones unreachable within the rules of the puzzle)? There are exactly 9! of them and you can get their index in lexicographical order in time linear with the number of the elements (so here O(9), which is O(1)). This is called permutation rank and is a well-known problem. And here you have the succinct hash that does not need more space than 9! slots and is collision-free. — Palec, Jan 07 '16 at 09:52
@Palec Honestly I didn't understand the algorithm, nor did I read the proof. I just read Paul's edit for the example where the algorithm finally makes sense to me. — SGM1, Jan 07 '16 at 19:36

score 0 · Accepted Answer · edited May 23 '17 at 11:59

0

First. There is nothing faster than a list of booleans. There's a total of 9! == 362880 possible permutations for your task, which is a reasonably small amount of data to store in memory:

visited_states = [False] * math.factorial(9)

Alternatively, you can use array of bytes which is slightly slower (not by much though) and has a much lower memory footprint (by a power of magnitude at least). However any memory savings from using an array will probably be of little value considering the next step.

Second. You need to convert your specific permutation to it's index. There are algorithms which do this, one of the best StackOverflow questions on this topic is probably this one:

Finding the index of a given permutation

You have fixed permutation size n == 9, so whatever complexity an algorithm has, it will be equivalent to O(1) in your situation.

However to produce even faster results, you can pre-populate a mapping dictionary which will give you an O(1) lookup:

all_permutations = map(lambda p: ''.join(p), itertools.permutations('123456789'))
permutation_index = dict((perm, index) for index, perm in enumerate(all_permutations))

This dictionary will consume about 50 Mb of memory, which is... not that much actually. Especially since you only need to create it once.

After all this is done, checking your specific combination is done with:

visited = visited_states[permutation_index['168249357']]

Marking it to visited is done in the same manner:

visited_states[permutation_index['168249357']] = True

Note that using any of permutation index algorithms will be much slower than mapping dictionary. Most of those algorithms are of O(n²) complexity and in your case it results 81 times worse performance even discounting the extra python code itself. So unless you have heavy memory constraints, using mapping dictionary is probably the best solution speed-wise.

Addendum. As has been pointed out by Palec, visited_states list is actually not needed at all - it's perfectly possible to store True/False values directly in the permutation_index dictionary, which saves some memory and an extra list lookup.

edited May 23 '17 at 11:59

Community

1
1

answered Jan 06 '16 at 12:04

Lav

2,204
12
23

1

Do you realize that the dictionary converting permutation to its index is [actually a hash](http://stackoverflow.com/a/9022835/2157640)? Thus your answer is just a more complicated (and slower) variant of: “Use the built-in dictionary and don’t care.” – Palec Jan 07 '16 at 09:42
1

BTW there are linear-time algorithms for permutation ranking, although AFAIK none in lexicographical order (which does not really matter to your problem). E. g. the one from my supervisor (just google “permutation rank”): [Linear-time ranking of permutations](http://mj.ucw.cz/papers/rank.ps) – Palec Jan 07 '16 at 10:06
The hash table lookup is O(n), not O(1) since your key is of size O(n). – Paul Hankin Jan 07 '16 at 12:29
I fail to see how this is the accepted answer. This answer does not meet the OP's stated requirement a Dictionary using an amortized `O(1)` lookup, not a perfect O(1). So many great answers, especially Paul's answer. – SGM1 Jan 07 '16 at 19:07
@SGM1 Hash table lookup efficiency depends on the ratio between "hit" and "miss" lookups. It's only "amortized" in a generic situation. In this case, **all** lookups will be "hit", so the "amortized" argument does not apply. Besides, why do you think answer was accepted for lookup table idea? It's perfectly possible OP is actually using one of permutation index algorithms. – Lav Jan 08 '16 at 10:07
@Palec, you might have noticed I have stated quite clearly that in this case, where the size of permutation is fixed, **any** algorithm will formally be O(1) for OP's needs. So whatever algorithm you choose will only affect the performance **linearly**. Try to come with an algorithm that will be at least 1% as fast as Python pre-populated dictionary lookup, and then there's something to discuss. – Lav Jan 08 '16 at 10:13
1

My suggestion is simple: Make `visited_states` a dictionary and use `visited_states['168249357']` instead of `visited_states[permutation_index['168249357']]`. Uses less memory, is guaranteed to be faster. – Palec Jan 08 '16 at 10:17
@Palec, Actually... that makes sense. :-) – Lav Jan 08 '16 at 10:18
Still, this is only amortized O(1). Hits and misses talk into the complexity, sure, but then collisions come into play. You do not have control over the hash function the built-in Python dictionary uses to hash strings like `'168249357'`. – Palec Jan 08 '16 at 10:24
@Palec That might be true, but I'm willing to bet Python hash lookup code is still way more efficient than anything OP has the resources to implement, even discounting the difference in performance between compiled binary and interpreted bytecode. Besides I'm pretty sure Python hash lookup is actually scalable - I have done a **lot** of performance tests on Python dict, and there's no visible performance hit for dicts much larger than the one used here. – Lav Jan 08 '16 at 10:31
@Lav How do you think the keys are stored in a dictionary :) – SGM1 Jan 08 '16 at 16:07
I'm not arguing whether this is a practical solution. I'm arguing this is not one of the better solution. On the 8 puzzles that require quite a few moves (up to 31 apparently) Paul's solution MAY (not sure because the hash, even though is O(1) is still computational strenuous) may, and probably will, out perform this one. On paper, Paul's answer blows this one out of the water. – SGM1 Jan 08 '16 at 16:29
@SGM1 What matters is not how keys are stored, but how hash lookup is implemented. As for performance, please, please always do actual performance tests before making any claims. That's what I did, and Paul's code is roughly 100 times slower than `hash(''.join(permutation_list))`, and let me tell you, direct dictionary lookup is even faster without the overhead of an extra function call. – Lav Jan 08 '16 at 16:42
Paul's answer can easily be optimized, Yours cannot. Join is a very expensive call. Pauls is making his own factorial function, can be replace by an optimized one (still multiplication is expensive I agree). He could of course using a for loop instead of the `p < perm[0] for p in perm`, pretty sure this is slower than pure iteration to count. And he can replace the call to `len(perm)` with a second funtion parameter. – SGM1 Jan 08 '16 at 16:46
Actually for factorial, since there are only 8 factorial's he is using, He can pre compute them and use directly. `myFactorial[1] = 1; myFacotiral[2] =2....`. "What matters is not how keys are stored", this is a dictionary within itself, therefore there are TWO areas were hashing and collision might happen. When the OP is asking for `O(1)` I would think he looking for something that would look good on paper. – SGM1 Jan 08 '16 at 16:56
If you really want to test it, throw both algorithms against an impossible puzzle and see which on exhausts first. I'm honestly unsure which would be faster on modern hardware. – SGM1 Jan 08 '16 at 17:01
@SGM1 I threw both algorithms against full list of '123456789' permutations. Built-in `hash()` function performed in 0.13 seconds. Paul's code took more than 11 seconds. Paul's code may be improved, but until you actually improve it and run real tests on it - please don't make any bold claims. They might backfire. I did a lot of performance tests against Python dicts, so please, assume for just a second that I actually know what I'm talking about while you're talking pure theory. Go do some tests, then come back and we'll have something to talk. – Lav Jan 08 '16 at 17:19
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/100172/discussion-between-lav-and-sgm1). – Lav Jan 08 '16 at 17:21

Best way to hash ordered permutation of [1,9]

9 Answers9

Use

Check

How does it work?

Create data structure

Test current state and add if not visited yet