0

I would imagine this question has been asked at some point before. Unfortunately despite a reasonable amount of searching I haven't managed to find an instance of the question I'm looking for.

In summary the question is: is it better for performance to cache a bit value in a column when it's reasonably easy to do and then use that as part of the select statement when returning hundreds or thousands of records or do a join to another table and query the table for the resulting bit value?

I've got a table called client, and then an associated table called client payments linked by the clientid. When the user lists all of their clients (there are ~200,000 clients/~3000 users) I need the user to be able to see whether any of the client payments are still outstanding for each client. That's pretty easy. I currently use this when selecting the client (entity framework being used):

return query.Select(ci => new ClientSummary {
[...]
HasFuturePayments = ci.ClientPayments.Any(c=>c.PaidDate==null)
}

That's all pretty simple. And will work fine.

But I could do this in a different way. Would appreciate advice on whether the second option is better from a performance perspective?

When I save the client payments I could potentially indicate in the client table that there are future payments owing by adding a bit value. So then the select would be this:

return query.Select(ci => new ClientSummary {
[...]
HasFuturePayments = ci.HasFuturePayments
}

This obviously means potentially more storage for the client. I've got 3 bit values in the client table so at the moment it's only taking up one byte. But it means that it's not having to go and query the associated client payments table in order to find out. Given that any user could have a few hundred or even thousands of clients - often returned in one go, is the performance improvement of not querying another table as part of the select worth the potential of taking up one extra bit of space in the client table (given we currently have ~200,000 and I would imagine will be over 500,000 in a couple of years)?

For me from an implementation perspective it doesn't really matter either way. I just want to provide the fastest possible database performance.

Hope this makes sense!

Thanks, Stefan

Stefan
  • 134
  • 1
  • 4
  • 13
  • 1
    My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless *you* define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". https://meta.stackexchange.com/q/204461 – philipxy Jul 29 '18 at 07:43
  • See also [Should this information be calculated in real time or stored in a seperate database?](https://stackoverflow.com/a/24377557/3404097) & its link [Using update and insert triggers to provide a count column in other relations](https://stackoverflow.com/q/24193410/3404097) – philipxy Jul 29 '18 at 08:01
  • 1
    If you have a perormance issue you should _maybe_ consider it. If you _don't_ have a performance issue, don't consider it. – Nick.Mc Jul 29 '18 at 09:09

0 Answers0