1

I'm looking for a way to implement a hashing mechanism to hash an input (0 to 2^32 - 1) to a fixed possibly 12 character hash.

Background:

I have a transaction table, where the primary key is auto increment (max size is 2^32) and I have to show an invoice no to the client which has to be of decent characters length (I'm thinking 12) and so since the client shouldn't get id as 0000-0000-0001, I was thinking hashing is the best way to go.

The main requirement (that I can think of) is that many to one mapping should never take place, and should not be slow.

Would it be okay if I use a common hashing mechanism and then drop the extra characters. (md5 for example in php generates 32 character string)?

The way I understand, there is no need to be secure cryptographically, and so I can generate a custom hash if possible.

Similar links:

1) Symmetric Bijective Algorithm for Integers

2) Pseudo-random-looking one-to-one int32->int32 function

Community
  • 1
  • 1
Junior Superman
  • 141
  • 1
  • 4
  • 11
  • How the id should look like? Why is it not possible just to return the primary key? – kraskevich Dec 31 '14 at 12:35
  • 1
    Making your own hashing algorithm is fine, fun, but also harder than it seems. Dropping the "unwanted characters" on a known algorithm is hazardous. You may hit duplicates. Having said so, Git uses SHA-1, and the first 8 characters are often enough for "unicity in practice". I would not rely on that without double checking! Happy New Year. – Eric Platon Dec 31 '14 at 12:36
  • @user2040251 : the id is auto increment, and I don't think that I should give invoice numbers as 000...001 and so on. – Junior Superman Dec 31 '14 at 12:36
  • What about something like: 1. Take the key. Map each digit to another character(using a fixed table). 2. Return the result. – kraskevich Dec 31 '14 at 12:39
  • @user2040251 this would not really work, since if i map 0 to 9, it'd just look like 999...991 instead of 000...0001 – Junior Superman Dec 31 '14 at 12:41
  • @JuniorSuperman You can use different mappings for different positions. – kraskevich Dec 31 '14 at 12:41
  • 4
    Multiplying by any odd number modulo 2^32 is a bijection, so that guarantees no duplicates but shuffles it up a bit anyway (pick a big number, otherwise it's very noticeable), but anyone willing to to give it more than a cursory glance will figure out what you did. – harold Dec 31 '14 at 12:46
  • 1
    Eric Lippert showed an easy way to do this with a multiplicative inverse. See http://ericlippert.com/2013/11/14/a-practical-use-of-multiplicative-inverses/ – Jim Mischel Jan 02 '15 at 16:21

1 Answers1

0

Using md5 and chopping off most of it is not a good idea, because there is no guarantee that you would get a unique cache. Besides, you have much easier alternatives available to you, because you have a lot more bits than you need.

Values in the range [0..232] need 32 bit (duh!). You have 12 printable characters, which give you 72 bits if you stay within Base-64 encoding range of characters. You don't even need that many characters - you can use three bits per character for the initial eight characters, and two bits per character for the last four digits. This way your 12 characters would stay in the range ['0'..'7'], and the last four would be in the range ['0'..'3']. Of course you are not bound to numeric digits - you could use letters for some groups of digits, to give it a more "randomized" appearance.

the id is auto increment, and I don't think that I should give invoice numbers as 000...001 and so on.

Start with least significant bits when you generate these representations, then proceed to least significant, or make an arbitrary (but fixed) map of which bits go to what digit in the 12-character representation. This way the IDs would not look sequential, but would remain fully reversible.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523