Two things occurred to me as the basis of a solution:
summing doesn't depend on order, which is actually a flaw in simple checksums (they don't catch changes in block order within a word), and
we can convert strings to summable numbers using their charcodes
Here's a function to do (2) :
charsum = function(s) {
var i, sum = 0;
for (i = 0; i < s.length; i++) {
sum += (s.charCodeAt(i) * (i+1));
}
return sum
}
Here's a version of (1) that computes an array hash by summing the charsum values:
array_hash = function(a) {
var i, sum = 0
for (i = 0; i < a.length; i++) {
var cs = charsum(a[i])
sum = sum + (65027 / cs)
}
return ("" + sum).slice(0,16)
}
Fiddle here: http://jsfiddle.net/WS9dC/11/
If we did a straight sum of the charsum values, then the array ["a", "d"] would have the same hash as the array ["b", "c"] - leading to undesired collisions. So based on using non-UTF strings, where charcodes go up to 255, and allowing for 255 characters in each string, then the max return value of charsum is 255 * 255 = 65025. So I picked the next prime number up, 65027, and used (65027 / cs) to compute the hash. I am not 100% convinced this removes collisions... perhaps more thought needed... but it certainly fixes the [a, d] versus [b, c] case.
Testing:
var arr1 = ['alpha','beta','gama'];
var arr2 = ['beta','alpha','gama'];
console.log(array_hash(arr1))
console.log(array_hash(arr2))
console.log(array_hash(arr1) == array_hash(arr2))
Outputs:
443.5322979371356
443.5322979371356
true
And testing a case that shows different hashes:
var arr3 = ['a', 'd'];
var arr4 = ['b', 'c'];
console.log(array_hash(arr3))
console.log(array_hash(arr4))
console.log(array_hash(arr3) == array_hash(arr4))
outputs:
1320.651443298969
1320.3792001649144
false
Edit:
Here's a revised version, which ignore duplicates from the arrays as it goes, and return the hash based on unique items only:
http://jsfiddle.net/WS9dC/7/
array_hash = function(a) {
var i, sum = 0, product = 1
for (i = 0; i < a.length; i++) {
var cs = charsum(a[i])
if (product % cs > 0) {
product = product * cs
sum = sum + (65027 / cs)
}
}
return ("" + sum).slice(0, 16)
}
testing:
var arr1 = ['alpha', 'beta', 'gama', 'delta', 'theta', 'alpha', 'gama'];
var arr2 = ["beta", "gama", "alpha", "theta", "delta", "beta"];
console.log(array_hash(arr1))
console.log(array_hash(arr2))
console.log(array_hash(arr1) === array_hash(arr2))
returns:
689.878503111701
689.878503111701
true
Edit
I've revised the answer above to account for arrays of words that have the same letters. We need these to return different hashes, which they now do:
var arr1 = ['alpha', 'beta']
var arr2 = ['alhpa', 'ateb']
The fix was to add a multiplier to the charsum func based on the char index:
sum += (s.charCodeAt(i) * (i+1));