0

We have table where we store tokens for users (i.e. accessTokens).

The problem is, sometimes tokens can have more than 255 length and MySQL/MariaDB is unable to store it into table that have unique index on this column.

We need unique indexes, therefore one solution is to add additional column with hash of token which has max 255 length and put unique index to it. Any search/save will go through this hash, after match, we select the whole token and send it back. After a lot of thinking and googling this is probably the only viable solution for this use-case (but you can try to give us another one).

Every single token we generate right now is at least partially random, therefore slightly chance of hash collision is "ok", the user is not stucked forever in next request, it should pass.

Do you know any good modern method in 2017? Having some statistical data about hash collision for this method would be appreciated.

The hash is only for internal use - we dont need it to be secure (fast insecure hash is best for us), it should be long enough to have low chance of collision but must never ever pass the 255 length limit.

PS: Setting up special version of database/table that allows more length is not viable, we need it also in some older system without migration.

libik
  • 22,239
  • 9
  • 44
  • 87

2 Answers2

1

Are these access tokens representable with 8-bit characters? That is, are all the characters in them taken from the ASCII or iso-8859-1 character sets?

If so, you can get a longer unique index than 255 by declaring the access-token column with COLLATE latin1_bin. The limit of an index prefix is 767 bytes, but utf8 characters in VARCHAR columns take 3 bytes per character.

So a column with 767 unique latin1 characters should be uniquely indexable. That may solve your problem if your unique hashes all fit in about 750 bytes.

If not ...

You've asked for a hash function for your long tokens with a "low" risk of collision. SHA1 is pretty good, and is available as a function in MySQL. SHA512 is even better, but doesn't work in all MySQL servers. But the question is this: What is the collision risk of taking the first, or last, 250 characters of your long tokens and using them as a hash?

Why do I ask? Because your spec calls for a unique index on a column that's too long for a MySQL unique index. You're proposing to solve that problem by using a hash function that is also not guaranteed to be unique. That gives you two choices, both of which require you to live with a small collision probability.

  1. Add a hash column that's computed by SHA2('token', 512) and live with the tiny probablility of collision.
  2. Add a hash column that's computed by LEFT('token', 255) and live with the tiny probability of collision.

You can implement the second choice simply by removing the unique constraint on your index on the token column. (In other words, by doing very little.)

The SHA has family has well-known collision characteristics. To evaluate some other hash function would require knowing the collision characteristics of your long tokens, and you haven't told us those.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
0

Comments on HASHing

UNHEX(MD5(token)) fits in 16 bytes - BINARY(16).

As for collisions: Theoretically, there is only one chance in 9 trillion that you will get a collision in a table of 9 trillion rows.

For SHA() in BINARY(20) the odds are even less. Bigger shas are, in my opinion, overkill.

Going beyond the 767 limit to 3072

⚈ Upgrade to 5.7.7 (MariaDB 10.2.2?) for 3072 byte limit -- but your cloud may not provide this;
⚈ Reconfigure (if staying with 5.6.3 - 5.7.6 (MariaDB 10.1?)) -- 4 things to change: Barracuda + innodb_file_per_table + innodb_large_prefix + dynamic or compressed.

Later versions of 5.5 can probably perform the 'reconfigure'.

Similar Question: Does MariaDB allow 255 character unique indexes?

Rick James
  • 135,179
  • 13
  • 127
  • 222