0

In my project, on of the requirements is to generate 7 digit alphanumeric code for every transaction user does.

One user can be part of multiple organisations and for every new transaction that user does in any of the organisations we want to generate 7 digit alphanumeric non-repeating code.

Implementation:

Every time we generate a new code we don't want to check in the database whether that id existed or not because that will not scale.

we want to have a function which if we call n number of times will return n number of unique codes or any other implementation also works fine.

I've tried to research on how git generates 7 digit unique sha inside the project without any success,

Any kind of help pointing to resources or boilerplate code is really appreciated.

If it helps we are using node.js and mongodb in our backend.

Sasikanth
  • 3,045
  • 1
  • 22
  • 45

1 Answers1

4

Git does not generate a 7 digit unique code.

Git uses a cryptographic hash function—currently SHA-1, with SHA-256 on the horizon—to generate a 160-bit number for each commit, with the hope that the result is unique based on the commit's data itself being unique. As we know from the pigeonhole principle, this is doomed to fail eventually. We can do some mathematics to determine approximately when it will fail. With SHA-1, we hit a 50% probability of a hash collision—of the full 160-bit hash—between any two particular inputs when we have reached about 280 unique inputs. Using a 256-bit hash (SHA-256) gives us 2128 instead: in each case, it is the square root of the number of possible hashes.

The birthday paradox makes the collision probability increase much faster than this naive analysis suggests, however. Fortunately, even with the relatively poorer 160-bit hash, we're still good for having the probability of collisions fall below about 10-18 even with over 1 quadrillion unique inputs.

This does, however, assume random inputs. Deliberate attacks, made with knowledge of the hashing algorithm, can produce collisions on purpose. See http://shattered.it/ for an example.

(160 bits of UUID requires 40 hexadecimal characters to express; 256 bits requires 64 characters. There are more-dense encodings, of course. The densest such encoding is raw binary, where 160 bits fits in 20 bytes, and 256 bits fits in 32 bytes. The 7 characters you see in git log output is the result of truncating the 40 character hexadecimal representations to 7 characters, which probabilistically will not have a hash collision until the number of objects grows fairly large.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • That’s great to know. In my usecase, what can be we do about it? – Sasikanth Jul 10 '21 at 05:38
  • You can use any UUID scheme you like. A set of random bytes is as good as any, unless you have some reason to want a predictable but one-way hash such as SHA or Blake or some other cryptographic hash. – torek Jul 10 '21 at 09:02
  • I just ran a simple test with my requirement, which is random 7 digit alpha numeric digit with this line `crypto.randomBytes(4).toString('hex').substring(0,7);` I've run it for million requests and I found `1800` duplicates with that. I can handle error cases in my project but is there any way I can write a function which returns unique string every time. – Sasikanth Jul 12 '21 at 03:06
  • Not by use of randomness. See Equation 4.1 on p. 77 of [my in-progress book](http://web.torek.net/torek/tmp/book.pdf). – torek Jul 12 '21 at 03:45