Wow, @wildplasser answer is actually super smart. To expand a bit:
Any number can be decomposed in a unique fashion in prime numbers (this is known as the fundamental theorem of arithmetic). His answer relies on that, by building a number for which the input array is a representation of prime factor decomposition. As multiplication is commutative, the exact order of the elements in the array is of no importance, but a given number is associated to one (and only one) sequence of elements.
His solution can be expanded for arbitrary size, e.g. in Python:
import operator
import itertools
import math
class primes(object):
def __init__(self):
self.primes = [2,3,5,7,11]
self.stream = itertools.count(13, 2)
def __getitem__(self, i):
sq = int(math.sqrt(i))
while i >= len(self.primes):
n = self.stream.next()
while any(n % p == 0 for p in self.primes if p <= sq):
n = self.stream.next()
self.primes.append(n)
return self.primes[i]
def prod(itr):
return reduce(operator.mul, itr, 1)
p = primes()
def hash(array):
return prod(p[i] for i in array)
With the expected results:
>>> hash([1,2,2,3,5])
6825
>>> hash([5,3,2,2,1])
6825
Here, 6825 = 3^1 x 5^2 x 7^1 x 13^1
, as 3
is the '1' prime (0-indexed), 5
the '2', etc...
>>> 3**1 * 5**2 * 7**1 * 13**1
6825
Building the number itself is O(n) multiplications, as long as the final result remains in the domain of int
that you are using (unfortunately, I suspect it might get out of hand quite quickly). Building the series of prime number with an Eratosthenes Sieve as I did is asymptotically O(N * log log N), where N is the m-th largest prime. As asymptotically, N ~ m log m, this gives an overall complexity of O(n + m * log m * loglog (m * log m))
Using a similar approach, instead of taking the prime number decomposition, we can also consider the array to be a representation of the decomposition of a number in a base. To be consistent, this base has to be larger than the larger number of similar elements (e.g. for [5, 3, 3, 2, 1]
, base must be > 2 because there are two 3
). Being on the safe side, you can write:
def hash2(array):
n = len(array)
return sum(n**i for i in array)
>>> hash2([1,5,3,2,2])
8070
>>> hash2([2,1,5,2,3])
8070
You can improve this by computing the largest number of similar elements in the array first, but the hash2
function will be a real hash only when used with the same basis, so the prime number decomposition is probably safe if you work with arrays of varying length and composition, as it will ALWAYS return the same unique integer per bag of numbers.