2

I have a very long string and I need a unique ID to cache it. The id doesn't need to be reversed however equal strings need to return the equal id.

For example:

  • this-is-a-very-long-string -> SFG2527G
  • this-is-something-else -> JSNTFK2783
  • this-is-a-very-long-string -> SFG2527G

I don't need to reverse the hash.

What's the best way to achieve this with Javascript?

Costantin
  • 2,486
  • 6
  • 31
  • 48
  • https://developer.mozilla.org/en-US/docs/Web/API/Web_Crypto_API – Teemu Jan 21 '21 at 09:24
  • 3
    I don't understand the question. What you describe is basically what a hash does, so since you already tagged your question with `cryptojs`, just choose a hashing algorithm (e.g. SHA-1), and use that. Just keep in mind: the shorter your hash values, the higher the risk of collisions. – Robby Cornelissen Jan 21 '21 at 09:26
  • 1
    Yes this seems to work: `const long_string = 'hey there'` `const hash = require('crypto').createHash('sha256').update(long_string, 'utf8').digest('hex');` – Costantin Jan 21 '21 at 09:29
  • I thought that a hash would be always longer than the string. However it's not the case. I tried with a very long string and it's still reasonably short. – Costantin Jan 21 '21 at 09:30
  • Most hash algorithms generate hashes of a fixed length. For SHA-256, it's 256 bits, so 64 hex characters. – Robby Cornelissen Jan 21 '21 at 09:31
  • 1
    Great. Feel free to post it in the answer & I'll accept it. Thank you. – Costantin Jan 21 '21 at 09:33

1 Answers1

3

The shorter the hash, the higher the chances of collision.

However, in the Java world, there is a string hashCode helper that returns a "not so unique" integer out of a string, and it's the smallest way to map a string as ID, but it doesn't guarantee uniqueness and it suffers collisions.

Accordingly, I strongly discourage you to use this in the wild, but for answer sake, here how you can play around with such hash:

function hashCode(s) {
  for (var h = 0, i = 0; i < s.length; h &= h)
    h = 31 * h + s.charCodeAt(i++);
  return h;
}

On the other hand, sha256 is a one way hashing that "doesn't suffer collisions" (it does, but much less than MD5, SHA1, or the hashCode up there), so while the result is a longer unique id, it's kinda granted to always work as expected, and it's explained in MDN.

P.S. NodeJS 15+ has a crypto.webcrypto namespace that is identical to the Web one, so you can use the same code in browsers and server.

Andrea Giammarchi
  • 3,038
  • 15
  • 25
  • it's in quotes, because it **does** suffer collision, like every one-way hashing that produces a finite set of chars, but it's extremely rare in the real world, and considered much more robust than MD5 or SHA1, but it has better performance than SHA-512, which also suffers collisions. They all do. Compared to the *Java* hash, which suffers more collisions than anything else, SHA-256 is a safe enough, and fast, bet. – Andrea Giammarchi Jan 21 '21 at 09:42
  • Putting it in double quotes means that you should take it with a pinch of salt. Yes, collisions do occur but are extremely unlikely. 2^256 is approximately 1.158 * 10^77. That's a lot of possible values and can therefore hash a lot of possible inputs without collisions. – phuzi Jan 21 '21 at 09:44
  • 2
    P.S. I've edited the answer so we can move forward with our day. – Andrea Giammarchi Jan 21 '21 at 09:44