1

Now I have some sets of integers, say:

   set1 = {int1, int2, int3};
   set2 = {int2, int3, int1};
   set3 = {int1, int4, int2};

The order or the numbers is not taken into consideration, so set1 and set2 are the same, while set3 are not with the other two.

Now I want to generate a unique key for these sets to distinguish them, in that way, set1 and set2 should generate the same key.

I think this for a while, thoughts as sum up the integers came to my mind but can be easily proved wrong. Sort the set and do

key = n1 + n2*2^16 + n3*2^32

may be a possible way but I wonder if this can be solved more elegantly. The key can be either integer or string.

So any one has some idea about solving this as fast as possible? Or any reading material is welcome.

More info: The numbers are in fact colors so each integer is less than 0xffffff

fabregaszy
  • 496
  • 6
  • 23

3 Answers3

0
  • If numbers of your set is not so large, I think hashing each set into one string can be one of proper solution.
  • Then they are lager ones, you can make it small ones by mod function or whatever. And by this, they can be dealed with in the same way.

Hope this will help your solution if there is no better idea.

  • the set is not large but the number of sets is large so I have to calculate the hash for tons of time. Hash into one string may cause a problem: {1, 2, 34} may generate the same key with {12, 3, 4}? – fabregaszy Mar 10 '15 at 05:52
0

I think a key of practical size can only be a hash value - there will always be a few pairs of inputs that hash to the same key, but you can make this unlikely.

I think the idea of sorting and then applying a standard hash function is good, but I don't like your hash multipliers. If arithmetic is mod 2^32, then multiplying by 2^32 is multiplying by zero. If it is mod 2^64, then multiplying by 2^32 will lose the top 32 bits of the input.

I would use a hash function like that described in Why chose 31 to do the multiplication in the hashcode() implementation ?, where you keep a running total, multiplying the hash value by some odd number before you add then next item into it. Multiplying by an odd number mod 2^n will at least not lose information immediately. I would suggest 131, but Java has a tradition of using 31.

Community
  • 1
  • 1
mcdowella
  • 19,301
  • 2
  • 19
  • 25
0

If these were small integers (all within the range(0,63) for example) then you could represent each set as a bitstring (1 for any integer that's present in the set; 0 for any that's absent). For sparse sets of large integers this would be horrendously expensive in terms of storage/memory).

One other method that comes to mind would be to sort the set and form the key as the concatenation of each number's digital representation (separated by some delimiter). So the set {2,1,3} -> "1/2/3" (using "/" as the delimiter) and {30,1,2,4} => "1/2/4/30"

I suppose you could also use a hybrid approach. All elements < 63 are encoded into a hex string and all others are encoded into a string as described. Then your final resulting key is formed by: HEXxA/B/c ... (with the "x" separating the small int hex string from the larger ints in the set).

Jim Dennis
  • 17,054
  • 13
  • 68
  • 116
  • Thanks! Both ways are very heruistic. – fabregaszy Mar 10 '15 at 06:01
  • 1
    Not really heuristic. They are simple mechanical transformations. http://en.wikipedia.org/wiki/Heuristic ... a relevant heuristic here would be: "sets of small integers are most efficiently encoded into bit strings" while "any set containing larger integers is better treated as a string of digits") – Jim Dennis Mar 10 '15 at 06:04