6

I can either have an auto increment id field as my primary key or a sha1 hash.

Which one should I choose?

Which would be better in terms of performance?

cgwebprojects
  • 3,382
  • 6
  • 27
  • 40
  • 1
    Which you should choose will depend on your application. In terms of performance, I'd go for the option that doesn't require the computation of an expensive hash. – eggyal May 26 '12 at 05:35
  • My 2 tables involve a sha1 hash being stored that relates each row, I was asking as a varchar(40) column on 2 tables i think is more detrement than 1 varchar(40) and 2 int columns. – cgwebprojects May 26 '12 at 05:37
  • @cgwebprojects 2 char(40) (no need for varchar) is 80 bytes (depending on character set). 1 char(40) and 2 ints is 48 bytes. Also, indexes are going to be faster on ints than on a char(40). – Corbin May 26 '12 at 05:40
  • Thanks this is what I was trying to work out, will change it to `CHAR` also :) – cgwebprojects May 26 '12 at 05:42

3 Answers3

22

There are a few application-driven cases where you'd want to use a globally unique ID (UUID/GUID):

  1. You expect to (or are) using a sharding strategy to scale writes. You don't want the shard nodes to duplicate keys.
  2. You want to be able to safely port data from one node to another preserving keys. This is critical if you want to keep foreign-key relationships in-tact.
  3. Your application is also used offline (in-home sales, in-home repairs, etc.) where the offline application periodically syncs with the "source of truth". You'd want those offline keys to be unique without having to make a remote call. Otherwise, it is up to you to come up with a strategy to reorganize keys and relationships on the way in. With an auto-increment strategy and depending on the RDBMS you are using, this is likely a non-trivial task.

If you don't have a use-case from above or something similar, you may use an auto-increment id if that makes you comfortable; however, you may still want to consider UUID/GUID

The Trade Off:

There are a lot of opinions held about the speed / size of UUID/GUID keys. At the end of the day, it is a trade-off and there are lots of ways to gain or lose speed with a database. Ideally, you want your indexes to be stored in RAM in order to be as fast as possible; however, that is a trade-off you have to weigh against other considerations.

Other Considerations regarding UUID/GUID:

  1. Many RDBMS can produce a UUID.
  2. You can also produce a UUID via your application (you aren't tied to the RDBMS to generate).
  3. Developers / Testers can easily port data from environment to environment and have the application work as expected. This is an often overlooked use-case; however, it is one of the stronger cases for using a UUID/GUID strategy.
  4. There are databases that are optimized for use offline (CouchDB) where UUID is what you get.
Community
  • 1
  • 1
Wil Moore III
  • 6,968
  • 3
  • 36
  • 49
1

Use an auto increment id.

  • An ID does not have to be generted only incremented.
  • Hashes fit better for storing passwords.
  • You could get duplicate keys using SHA hashes. The chance is small but real.
  • An ID is way more readable
  • An ID is kind of an inserttion history. You know which record was inserted last (highest ID)
juergen d
  • 201,996
  • 37
  • 293
  • 362
  • 2
    You'd probably sooner reach the limit of what can be stored in an integer than find a collision with an SHA hash. Once again once you get to a shaded database auto increments become painful. – mjsa Mar 31 '15 at 09:33
1

Almost definitely an auto incrementing integer. It will be faster to create, faster to search, and smaller. Consider for example if you had another table that referenced it. Would you want it to reference it via an integral primary key or via a sha1 hash? An integer would be more meaningful (in a way), and it would be much (much!) more efficient.

Corbin
  • 33,060
  • 6
  • 68
  • 78
  • Thanks again! As I was storing a sha1 on one table I didn't know whether to link it to the other with the same sha1, but if an auto increment is better then so be it! – cgwebprojects May 26 '12 at 05:40
  • An auto incrementing ID just fits the design of a RDBMS better than linking via sha1 hashes. Integral ids are small, very fast to index and incrementing them is very cheap for the DB. sha1 are going to make bigger, slower indexes, and as juergen d noted, they are prone to collisions. (His answer should actually probably be the accepted one as it covers everything mine did + the collision) – Corbin May 26 '12 at 07:38
  • Can you imagine sharding a database with an auto increment ID? Actually no, you don't; because Instragram have already been through it: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram Auto increment IDs are ridiculous to scale. – mjsa Mar 31 '15 at 09:30