14

I was thinking about how I'm storing passwords in my database : appropriately salted SHA1 strings in a CHAR(40) field. However, since the character data in there is actually just a hex representation of a 160 bit number, I thought it might be better to store it as BINARY(20).

CREATE TABLE users (
    password BINARY(20)
    /* snip */
);

INSERT INTO users (password) VALUES (UNHEX(SHA1('mypassword'));

As I see it, one benefit of this approach is that it halves the size of that field, but I can imagine there's probably some downsides too.

What's your opinion?

nickf
  • 537,072
  • 198
  • 649
  • 721
  • You would only be saving a few bytes per password. Is it worth it? – pavium Nov 11 '09 at 04:09
  • 1
    well that's what I'm wondering. The benefits might be slim, but what are the costs? – nickf Nov 11 '09 at 04:20
  • OK, it seems there's general agreement, the benefits are slight and no-one's suggested any significant costs. If you made the change, would future backups be compatible with past backups? Would any code have to be changed? – pavium Nov 11 '09 at 04:42

6 Answers6

28

We used binary for a ton of different ids in our database to save space, since the majority of our data consisted of these ids. Since it doesn't seem like you need to save space (as it's just passwords, not some other huge scale item), I don't see any reason to use binary here.

The biggest problem we ran into was constantly, annoyingly, having binary data show up in the console (everytime you type select * you hear a million beeps), and you have to always do select HEX() or insert UNHEX(), which is a pain.

Lastly, if you mix and match (by mistake) binary and HEX/UNHEX and join on this value, you could match records you never intended to.

wsorenson
  • 5,701
  • 6
  • 32
  • 29
  • I love your use of `BINARY` to conserve space! Do you think you could help me get on the right track here? http://stackoverflow.com/questions/15539540/convert-c-int-to-varbinary-and-back-again –  Mar 23 '13 at 20:10
9

Here is my breakdown:

  1. If you use strings instead of binary, use a fixed length field. Since the hashing algos all output a fixed length you can save yourself some space there.
  2. Since you are only doing an equality comparison, there is no need for indexes. Binary fields have no collation type or character set.
  3. BINARY column types have no odd storage caveats like BLOBs do.
  4. Each hexadecimal character represents 4 bits in the 8 (or 7) bits it consumes. This means that binary storage is twice as efficient.
  5. MOST IMPORTANT: Unless you are working in an embedded system where each byte counts, don't do it. Having a character representation will allow you better debugging. Plus, every time a developer is working a problem like this I have to wonder why. Every architectural decision like this has trade-offs and this one does not seem like it adds value to your project.
  6. You can always convert to BINARY later with a simple SQL script.

In short, use a fixed length text field. There is no gain to counting bytes in the current world, especially when change is easy to achieve.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Jeffrey Hulten
  • 773
  • 5
  • 11
2

The hard disk space savings of storing your hashed passwords as binary rather than varchar are probably insignificant. How many users are you likely to have in this table? Multiply that by the space difference between BINARY(20) and VARCHAR(n) and I think you'll find it's not a significant savings. Personally, I would prefer the hex representation because at least I can type it in a query if I'm doing some ad-hoc operation during development or writing a unit test to validate password related operations. Hex is somewhat more readable than binary if I happen to be loading a data dump in a text editor, etc. My bottom line is that the hex representation would be more convenient during the development cycle.

Asaph
  • 159,146
  • 25
  • 197
  • 199
2

If you want a easy way to store binary in sql... you can convert to hex before. Check out this page: http://kekoav.com/blog/36-computers/58-uuids-as-primary-keys-in-mysql.html

Convert to hex, take off the "-" and put "0x" in front of the string. Mysql will understand as a byte content.

Example:

INSERT INTO users SET password=0x1e8ef774581c102cbcfef1ab81872213
Dharman
  • 30,962
  • 25
  • 85
  • 135
lexmooze
  • 381
  • 3
  • 4
1

This is an old question but I noticed nobody has mentioned data validation as an advantage to a BINARY column. Specifically, it is possible to store an invalid value in a CHAR(40) column by using characters that are not hex digits (0-9, a-f).

You could still insert the wrong value into the BINARY column (for example, if you forget to call UNHEX), but you will never have to consider reading a value from the database that doesn't parse correctly.

benzado
  • 82,288
  • 22
  • 110
  • 138
1

Why reinvent the wheel? Why not use CHAR(41) like table `mysql.user' uses? It's a well-known format, so any future maintainers won't be scratching their heads over your special scheme? Make it easy on everyone by just noting "just like MySQL passwords."

David M
  • 4,325
  • 2
  • 28
  • 40