162

I have a simple question which occured when I wanted to store the result of a SHA1 hash in a MySQL database:

How long should the VARCHAR field be in which I store the hash's result?

Gumbo
  • 643,351
  • 109
  • 780
  • 844
niklasfi
  • 15,245
  • 7
  • 40
  • 54
  • 10
    If you just googled sha1 click im feeling lucky and you should be on wikipedia where you can find it is always 160 bits. – Tim Matthews Mar 05 '09 at 12:19

7 Answers7

319

I would use VARCHAR for variable length data, but not with fixed length data. Because a SHA-1 value is always 160 bit long, the VARCHAR would just waste an additional byte for the length of the fixed-length field.

And I also wouldn’t store the value the SHA1 is returning. Because it uses just 4 bit per character and thus would need 160/4 = 40 characters. But if you use 8 bit per character, you would only need a 160/8 = 20 character long field.

So I recommend you to use BINARY(20) and the UNHEX function to convert the SHA1 value to binary.

I compared storage requirements for BINARY(20) and CHAR(40).

CREATE TABLE `binary` (
    `id` int unsigned auto_increment primary key,
    `password` binary(20) not null
);
CREATE TABLE `char` (
    `id` int unsigned auto_increment primary key,
    `password` char(40) not null
);

With million of records binary(20) takes 44.56M, while char(40) takes 64.57M. InnoDB engine.

John John Pichler
  • 4,427
  • 8
  • 43
  • 72
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 2
    In PostgreSQL, this would translate to using a bytea field, right? – mvexel Jan 17 '11 at 10:44
  • 1
    The solution is great, but there is another point to use char(40) with hexed sha1 - this is much more widely used, and there will be less conversion issues in an application code. – Arthur Kushman Sep 09 '13 at 08:29
  • 3
    Note for phpmyadmin users. When storing the hash as binary, phpmyadmin will display the it as a hex string, but pma will be unable to use it in the provided "search tab". Will work only if you add the `UNHEX()` manually to the sql. – Timo Huovinen Jan 14 '14 at 12:05
  • 2
    @Gumbo You can store a variable number of bytes in a bytea. You are referring to the storage requirements of the bytea type. Which is "1 or 4 bytes plus the actual binary string". What the "1 or 4" refers to could be the length of the data stored, since you cannot use a zero byte to end the string as you do with varchar. That implies, but isn't stated in the manual, that you can store up to 2^(8*4) or 4+ gigabytes in a bytea. http://www.postgresql.org/docs/9.0/static/datatype-binary.html Storing the hash in a postgres database would probably be smallest as a _bit_ or bytea column. – Viktor May 05 '14 at 19:17
  • Please update the [UNHEX function link](http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_unhex) – Nikunj Madhogaria Aug 18 '15 at 11:07
  • 2
    http://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_sha1 provides info on performance and storage when storing results of crypt functions – Clocker Apr 29 '16 at 15:36
  • I am not sure it is worth it to have non human readable data in the database to save 20 MB for 1 million records in 2016, except in very specialized cases where memory or storage is an issue. You will curse yourself every time you try to debug an issue. Use base64 and the overhead is only 10 MB. – Tony BenBrahim Jun 07 '16 at 11:35
  • If you choose to store a string representation and are counting on the fixed-length property of the hash, **beware of `CHAR`**: in multibyte charsets such as UTF-8, `CHAR` reserves multiple bytes for each character, as each character *could* be a complex one. E.g. a UTF-8 `CHAR` reserves 3 bytes per character. So in that case, go with `VARCHAR` or ASCII, depending on your requirements. – Timo Sep 08 '16 at 10:54
45

A SHA1 hash is 40 chars long!

schmilblick
  • 1,917
  • 1
  • 18
  • 25
11

Reference taken from this blog:

Below is a list of hashing algorithm along with its require bit size:

  • MD5 = 128-bit hash value.
  • SHA1 = 160-bit hash value.
  • SHA224 = 224-bit hash value.
  • SHA256 = 256-bit hash value.
  • SHA384 = 384-bit hash value.
  • SHA512 = 512-bit hash value.

Created one sample table with require CHAR(n):

CREATE TABLE tbl_PasswordDataType
(
    ID INTEGER
    ,MD5_128_bit CHAR(32)
    ,SHA_160_bit CHAR(40)
    ,SHA_224_bit CHAR(56)
    ,SHA_256_bit CHAR(64)
    ,SHA_384_bit CHAR(96)
    ,SHA_512_bit CHAR(128)
); 
INSERT INTO tbl_PasswordDataType
VALUES 
(
    1
    ,MD5('SamplePass_WithAddedSalt')
    ,SHA1('SamplePass_WithAddedSalt')
    ,SHA2('SamplePass_WithAddedSalt',224)
    ,SHA2('SamplePass_WithAddedSalt',256)
    ,SHA2('SamplePass_WithAddedSalt',384)
    ,SHA2('SamplePass_WithAddedSalt',512)
);
Toby Allen
  • 10,997
  • 11
  • 73
  • 124
Anvesh
  • 7,103
  • 3
  • 45
  • 43
  • 11
    Please, _please_, _**please**_ do not actually store passwords like this. – Berry M. Aug 22 '17 at 15:48
  • Hey berry, can you explain your WHY? in details – Anvesh Aug 23 '17 at 08:13
  • 4
    Storing simple hashes of passwords makes it much easier for the passwords to be "extracted" if your database is compromised than if you use a salted (hopefully stretched) password hash. Suggested reading: https://paragonie.com/blog/2016/02/how-safely-store-password-in-2016 – matt Sep 01 '17 at 15:20
  • 2
    @BerryM. reading this an year later, and didn't think for a sec that anyone is talking about passwords or that if people still use simple hash to store auth data. But they do :D – Rohit Hazra Sep 12 '18 at 11:37
6

Output size of sha1 is 160 bits. Which is 160/8 == 20 chars (if you use 8-bit chars) or 160/16 = 10 (if you use 16-bit chars).

inazaruk
  • 74,247
  • 24
  • 188
  • 156
3

So the length is between 10 16-bit chars, and 40 hex digits.

In any case decide the format you are going to store, and make the field a fixed size based on that format. That way you won't have any wasted space.

Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
2

If you need an index on the sha1 column, I suggest CHAR(40) for performance reasons. In my case the sha1 column is an email confirmation token, so on the landing page the query enters only with the token. In this case CHAR(40) with INDEX, in my opinion, is the best choice :)

If you want to adopt this method, remember to leave $raw_output = false.

Francesco Casula
  • 26,184
  • 15
  • 132
  • 131
  • 1
    Why would you not index BINARY(20)? Wouldn't that be just as fast and half as big in size? – nickdnk Feb 05 '17 at 15:23
  • Well this ~5 years ago but I think I was referring to the fact that you still need to unhex which adds some load (+ makes the application harder to maintain and less portable?). It kinda depends on your hardware as well, if you got less storage and it is slow also probably best to stick to the binary(20) otherwise I'd say char(40). Hard to say without running some tests with the language and hardware you'd be using and see what suits you best. – Francesco Casula Feb 06 '17 at 10:18
  • 1
    I suppose if you're doing anything other than select from where unhex(hash) = hash to fetch a single row, then perhaps you're right. But keeping the index buffered will take twice as much memory this way. – nickdnk Feb 06 '17 at 11:03
2

You may still want to use VARCHAR in cases where you don't always store a hash for the user (i.e. authenticating accounts/forgot login url). Once a user has authenticated/changed their login info they shouldn't be able to use the hash and should have no reason to. You could create a separate table to store temporary hash -> user associations that could be deleted but I don't think most people bother to do this.