42

I'm trying my best to persuade my boss into letting us use foreign keys in our databases - so far without luck.

He claims it costs a significant amount of performance, and says we'll just have jobs to cleanup the invalid references now and then.

Obviously this doesn't work in practice, and the database is flooded with invalid references.

Does anyone know of a comparison, benchmark or similar which proves there's no significant performance hit to using foreign keys? (Which I hope will convince him)

A-K
  • 16,804
  • 8
  • 54
  • 74
Steffen
  • 13,648
  • 7
  • 57
  • 67
  • 5
    Just an update on the whole story: We've been allowed to use foreign keys now, on the notion that they may be disabled, if they result in a performance loss. So thanks everyone for your good points :-) – Steffen Feb 25 '10 at 06:54
  • We disabled them. They are nice but kill performance if you are performing many single inserts and deletes on tables with large amount of constraints AND which contain a higher number of records (i.e 100k or 1000K+). In our case to import one matrix took minutes to import. When we disabled constraints, it took a few seconds. – TheLegendaryCopyCoder Oct 13 '21 at 16:31
  • Have you moved to a another job now? I would. – Norbert Norbertson Nov 03 '21 at 09:02

7 Answers7

44

There is a tiny performance hit on inserts, updates and deletes because the FK has to be checked. For an individual record this would normally be so slight as to be unnoticeable unless you start having a ridiculous number of FKs associated to the table (Clearly it takes longer to check 100 other tables than 2). This is a good thing not a bad thing as databases without integrity are untrustworthy and thus useless. You should not trade integrity for speed. That performance hit is usually offset by the better ability to optimize execution plans.

We have a medium sized database with around 9 million records and FKs everywhere they should be and rarely notice a performance hit (except on one badly designed table that has well over 100 foreign keys, it is a bit slow to delete records from this as all must be checked). Almost every dba I know of who deals with large, terabyte sized databases and a true need for high performance on large data sets insists on foreign key constraints because integrity is key to any database. If the people with terabyte-sized databases can afford the very small performance hit, then so can you.

FKs are not automatically indexed and if they are not indexed this can cause performance problems.

Honestly, I'd take a copy of your database, add properly indexed FKs and show the time difference to insert, delete, update and select from those tables in comparision with the same from your database without the FKs. Show that you won't be causing a performance hit. Then show the results of queries that show orphaned records that no longer have meaning because the PK they are related to no longer exists. It is especially effective to show this for tables which contain financial information ("We have 2700 orders that we can't associate with a customer" will make management sit up and take notice).

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 10
    Very useful information. I'm currently adding foreign keys to a copy of the database, however it takes some time, since I have to delete like 20-30 million invalid rows first. This just proves how big a problem the lack of keys was in the first place. – Steffen Feb 24 '10 at 10:07
  • 2
    "You should not trade integrity for speed" - and yet that is what most NoSql databases does... In some cases it might be worth the tradeoff, but one should seriously benchmark the actual effect before making that call. – noocyte Nov 22 '17 at 07:19
  • 1
    @noocyte and that is why NoSql databases are generally unsuitable for any serious business application especially in regulated industries like finance and health care. – HLGEM Nov 22 '17 at 16:07
  • I know this post is really old but I'm wondering what the actual application was doing (or _not_ doing) to end up with 20-30 million invalid rows! Foreign keys aren't the _only_ way to preserve data integrity! – Charles Wood Nov 15 '21 at 17:56
21

From Microsoft Patterns and Practices: Chapter 14 Improving SQL Server Performance:

When primary and foreign keys are defined as constraints in the database schema, the server can use that information to create optimal execution plans.

Daniel Vassallo
  • 337,827
  • 72
  • 505
  • 443
  • 3
    sure, it can use those indexes to optimize...but that doesn't automatically mean there is no negative impact on performance. I think too many factors are involved to put this question to sleep with this reference. – Roland Bouman Feb 23 '10 at 10:56
  • @Roland: I see your point, and I agree with you. However my intention was to address the "I'm trying my best to persuade my boss" part of the question. – Daniel Vassallo Feb 23 '10 at 10:58
  • 1
    @Roland: In addition, the performance hit is very rarely severe. As you said in your answer, this is difficult to claim without knowledge of the application, but in practice the benefits of foreign keys are rarely traded for performance. Related SO post: http://stackoverflow.com/questions/1744878/can-foreign-keys-hurt-query-performance/ – Daniel Vassallo Feb 23 '10 at 11:16
  • Good link, I'll send him that with the quote as well :-) – Steffen Feb 23 '10 at 11:48
  • 1
    What that statement means is that, for example, the optimizer will know that a column defined as the only member of a primary key is both not-null and uniquely defined. Similarly, a single-column foreign key's number of distinct values is constrained by the cardinality of the table on which it depends (plus possibly null). – Adam Musch Feb 24 '10 at 20:37
9

This is more of a political issue than a technical one. If your project management doesn't see any value in maintaining the integrity of your data, you need to be on a different project.

If your boss doesn't already know or care that you have thousands of invalid references, he isn't going to start caring just because you tell him about it. I sympathize with the other posters here who are trying to urge you to do the "right thing" by fighting the good fight, but I've tried it many times before and in actual practice it doesn't work. The story of David and Goliath makes good reading, but in real life it's a losing proposition.

Dave Ziffer
  • 121
  • 1
  • 1
  • 1
    I do agree with @Dave here, but I encourage you to keep fighting the good fight anyway. It's unfortunate that sometimes management has fears about things that they (clearly, although we say it softly) don't understand. This costs companies not only in dollars for time spent rectifying issues that needn't have occurred in the first place, but it discourages creativity and dynamic solutions from the people who are actually best equipped to engineer those solutions. I retain the "never say die" attitude, until a solid proof is given or disproven...much to the irritation of my superiors... – Dawn Deschain May 19 '15 at 21:53
4

It is OK to be concerned about performance, but making paranoid decisions is not.

You can easily write benchmark code to show results yourself, but first you'll need to find out what performance your boss is concerned about and detail exactly those metrics.

As far as the invalid references ar concerned, if you don't allow nulls on your foreign keys, you won't get invalid references. The database will esception if you try to assign an invalid foreign key that does not exist. If you need "nulls", assign a key to be "UNDEFINED" or something like that, and make that the default key.

Finally, explain database normalisation issues to your boss, because I think you will quickly find that this issue will be more of a problem than foreign key performance ever will.

BenB
  • 10,300
  • 7
  • 32
  • 30
  • I'm pretty certain it's not any exact performance metric he's worried about, as he just states "it hurts performance" and that's pretty much it. Obviously this doesn't make it easier to prove wrong :-( – Steffen Feb 23 '10 at 11:48
  • 1
    If someone needs "nulls" he can have nullable columns, it won't create problems on FKs. I find the technique of adding "custom" "null rows" in primary tables in order just to not add a nullable column in the foreign table is worse than not using FKs at all. – George Mavritsakis Oct 09 '14 at 21:48
  • If you don't have FKs, then not allowing nulls in the FK column does nothing to prevent you from entering an invalid one. And it doesn't man that the record will not get orphaned if the original PK record is deleted or changed. – HLGEM Jul 03 '17 at 13:50
4

Does anyone know of a comparison, benchmark or similar which proves there's no significant performance hit to using foreign keys ? (Which I hope will convince him)

I think you're going about this the wrong way. Benchmarks never convince anyone.

What you should do, is first uncover the problems that result from not using foreign key constraints. Try to quantify how much work it costs to "clean out invalid references". In addition, try and gauge how many errors result in the business process because of these errors. If you can attach a dollar amount to that - even better.

Now for a benchmark - you should try and get insight into your workload, identify which type of operations are done most often. Then set up a testing environment, and replay those operations with foreign keys in place. Then compare.

Personally I would not claim right away without knowledge of the applications that are running on the database that foreign keys don't cost performance. Especially if you have cascading deletes and/or updates in combination with composite natural primary keys, then I personally would have some fear of performance issues, especially timed-out or deadlocked transactions due to side-effects of cascading operations.

But no-one can tell you- you have to test yourself, with your data, your workload, your number of concurrent users, your hardware, your applications.

Roland Bouman
  • 31,125
  • 6
  • 66
  • 67
  • Good point about the application, however the tables are really simple - all primary keys are identity ints, and we have no cascading delete/updates. So basically it's as simple as it gets :-D – Steffen Feb 23 '10 at 11:40
  • Steffen, that is useful info. I would probably not have any reservations, unless dealing with massive write loads. – Roland Bouman Feb 23 '10 at 11:52
  • I believe we're doing way more reading than writing, so this shouldn't be much of an issue either. Thanks for the comment :-) – Steffen Feb 23 '10 at 12:52
1

A significant factor in the cost would be the size of the index the foreign key references - if it's small and frequently used, the performance impact will be negligible, large and less frequently used indexes will have more impact, but if your foreign key is against a clustered index, it still shouldn't be a huge hit, but @Ronald Bouman is right - you need to test to be sure.

Peter T. LaComb Jr.
  • 2,935
  • 2
  • 29
  • 44
0

i know that this is a decade post. But database primitives are always on demand. I will refer to my own experience. In one of the projects that i have worked has to deal with a telecommunication switch database. They have developed a database with no FKs, the reason was because they wanted as much faster inserts they could have. Because sy system it self it have to deal with calls, it make some sense. Before, there was no need for any intensive queries and if you wanted any report, you could use the GUI software of the switch. After some time you could have some basic reports. But when i was involved they wanted to develop and AI thus to be able to create smart reports and have something like an automatic troubleshooting. It was completely a nightmare, having millions of records, you couldn't execute any long query and many times facing sql server timeout. And don't even think using Entity Framework. It is much difference when you have to face a situation like this instead of describing. My advice is that you have to be very specific on your design and having a very good reason why not using FKs.

JohnnyJaxs
  • 321
  • 3
  • 6