0

How optimal is it to have a primary key on 3 or 4 fields? If the table has say millions of records, is it going to be heavy on the server running a query such as:

Select * from my_table where field1='123' and field_2='123' and field_3='hours'

The primary key is created on these fields: field_1 int(11) field_2 int(11) field_3 varchar(20)

What I'm considering doing as an alternative is have those fields store the data with a primary key on a separate field which has an md5 hash of the data such as "md5(field_1+'-'+field_2+'-'+field_3) and then my script just queries one field such as:

Select * from my_table where field_hash=MD5('123-123-hours')

So basically I'm just wondering if method 1 is just as optimal as method 2 with a table with millions of records.

Joe
  • 3,043
  • 9
  • 43
  • 56

3 Answers3

1

I'd say your best option is to use a surrogate auto-incrementing field as the PK. Failing that I'd just use the three fields.

The md5 hash doesn't seem worth the complexity. I really don't see the benefit of that approach in any scenario. Don't try to outsmart the DB engine. If a hash was indeed faster, the indexing engine would be implemented internally that way for composite keys. It is not, which should tell you something.

With the surrogate key you get faster joins, with the composite key you get some performance benefits when you have queries that only return fields that are part of the primary key (covering indexes).

JohnFx
  • 34,542
  • 18
  • 104
  • 162
0

You can read about the composite key performance from the answers to question Composite Primary Key performance drawback in MySQL

Before doing this kind of optimization, you should always measure the effects. That is, create two tables with the same data, one using composite key and the other the hash, and try which one works better in you use case.

In general, I don't like using nonsensical key values if not absolutely necessary. If the hash is used as the primary key it means that the user of the database must be aware of the ID generation process. This leads to more documentation, that will not be read, and errors in the long run.

Instead of using the composite key you might want to see, if there is a possibility of normalizing your database further. Does the composite key represent a different entity and should it actually form an second table where you can attach a surrogate key to the set of columns?

Another option is to use a surrogate key in the current table and then place a unique constraint to the current composite key e.g.

create table 
   id int(11) primary key,
   field1 int(11), 
   field_2 int(11), 
   field_3 varchar(20), 
   constraint uq_composite unique (field1, field_2, field_3);
Community
  • 1
  • 1
Aleksi Yrttiaho
  • 8,266
  • 29
  • 36
0

I would try to avoid using a non sequential primary key (i.e. a string, or numbers that are randomly generated) because this causes more I/O on the disk and reduces performance on some storage engines (particularly MyISAM).

JamesHalsall
  • 13,224
  • 4
  • 41
  • 66