4

I am looking for constant time algorithm can change an ordered integer index value into a random hash index. It would nice if it is reversible. I need that hash key is unique for each index. I know that this could be done with a table look up in a large file. I.E. create an ordered set of all ints and then shuffle them randomly and write to a file in random sequence. You could then read them back as you need them. But this would require a seek into a large file. I wonder if there is a simple way to use say a pseudo random generator to create the sequence as needed?

Generating shuffled range using a PRNG rather than shuffling the answer by erikkallen of Linear Feedback Shift Registers looks like the right sort of thing. I just tried it but it produces repeats and holes.

Regards David Allan Finch

Community
  • 1
  • 1
David Allan Finch
  • 1,414
  • 8
  • 20
  • I don't think there is quite enough information here to come up with a good solution. How many integers do you need to hash? Will there be duplicates in that list of integers? What is the range of values that your list will have? – EvilTeach Feb 12 '09 at 01:24
  • Are the ordered integers, allowed to be negative? – EvilTeach Feb 12 '09 at 01:25
  • I was intending to use the full range of an unsigned long or long long (ie 32bit or 64bit). – David Allan Finch Feb 12 '09 at 10:29
  • just curious-- how does the "genetic algorithms" tag come into the picture? – Jason S Feb 12 '09 at 16:28
  • I typed Algorithm as that seamed to me what I was after. The auto complete put in "genetic algorithms". I am happy to remove the tag if it is wrong. – David Allan Finch Feb 12 '09 at 22:12

5 Answers5

6

The question is now if you need a really random mapping, or just a "weak" permutation. Assuming the latter, if you operate with unsigned 32-bit integers (say) on 2's complement arithmetics, multiplication by any odd number is a bijective and reversible mapping. Of course the same goes for XOR, so a simple pattern which you might try to use is e.g.

unsigned int hash(int x) {
   return (((x ^ 0xf7f7f7f7) * 0x8364abf7) ^ 0xf00bf00b) * 0xf81bc437;
}

There is nothing magical in the numbers. So you can change them, and they can be even randomized. The only thing is that the multiplicands must be odd. And you must be calculating with rollaround (ignoring overflows). This can be inverted. To do the inversion, you need to be able to calculate the correct complementary multiplicands A and B, after which the inversion is

unsigned int rhash(int h) {
    return (((x * B) ^ 0xf00bf00b) * A) ^ 0xf7f7f7f7;
}

You can calculate A and B mathematically, but the easier thing for you is just to run a loop and search for them (once offline, that is).

The equation uses XORs mixed with multiplications to make the mapping nonlinear.

Antti Huima
  • 25,136
  • 3
  • 52
  • 71
  • Interesting are A and B not inverses A=1/0x8364abf7 or are there rounding problems. – David Allan Finch Feb 12 '09 at 12:47
  • No no, they are not inverses in that sense. They are inverses in the finite multiplication group modulo 2**32. It has nothing to do with inverses in the field of rational numbers. – Antti Huima Feb 12 '09 at 15:28
  • I had a quick run for the first 100,000 numbers and the results looks good and very fast. I hope to have more time tomorrow to test it with a larger set of numbers. – David Allan Finch Feb 12 '09 at 22:15
  • Yes, it is good and fast :) The only thing is that lowest-order bits are pretty linear... but that won't harm most likely in your application. If you want to break that, you can add ((x<<13)|(x>>19)) to the equation after first multiplication. The reverse is obvious. – Antti Huima Feb 13 '09 at 01:14
3

You could try building a suitable Feistel network. These are normally used for cryptography (e.g. DES), but with at least 64 bits, so you may need to build one yourself that suits your needs. They are invertible by construction.

starblue
  • 55,348
  • 14
  • 97
  • 151
1

Assuming your goal is to spread out grouped values across the whole range,
it seems like shuffling the bits in some pre-defined order might do the trick.
i.e. given 8 bits ABCDEFGH, arrange them like EGDBHCFA, or some such pattern.

The code would just be a simple sequence of masks, shifts and adds.

AShelly
  • 34,686
  • 15
  • 91
  • 152
0

Mmm... depending if you have a lot of numbers, you could use a normal stl list, and order it by a "random" criteria

bool
nonsort(int i, int j)
{
    return  random() & 31 >16 ? true : false;
}

std::list<int> li;
// insert elements
li.sort(nonsort);

Then, you can get all the integers with a normal iterator. Remember to initialize random with srand() with time or any other pseudo-random value.

Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87
  • I did not know you could do that with sort and for smallish values I think this would be a good solution. But I was thinking about a size of a unsigned long long for the full range. – David Allan Finch Feb 11 '09 at 21:35
-1

For the set of constraints there really is no solution. An attempt to hash 32 bit unsigned, into a 32 bit unsigned, will give you collisions, unless you do a something simple, like a 1 to 1 mapping. Every number is its own hash.

EvilTeach
  • 28,120
  • 21
  • 85
  • 141