I would imagine this question has been asked at some point before. Unfortunately despite a reasonable amount of searching I haven't managed to find an instance of the question I'm looking for.
In summary the question is: is it better for performance to cache a bit value in a column when it's reasonably easy to do and then use that as part of the select statement when returning hundreds or thousands of records or do a join to another table and query the table for the resulting bit value?
I've got a table called client, and then an associated table called client payments linked by the clientid. When the user lists all of their clients (there are ~200,000 clients/~3000 users) I need the user to be able to see whether any of the client payments are still outstanding for each client. That's pretty easy. I currently use this when selecting the client (entity framework being used):
return query.Select(ci => new ClientSummary {
[...]
HasFuturePayments = ci.ClientPayments.Any(c=>c.PaidDate==null)
}
That's all pretty simple. And will work fine.
But I could do this in a different way. Would appreciate advice on whether the second option is better from a performance perspective?
When I save the client payments I could potentially indicate in the client table that there are future payments owing by adding a bit value. So then the select would be this:
return query.Select(ci => new ClientSummary {
[...]
HasFuturePayments = ci.HasFuturePayments
}
This obviously means potentially more storage for the client. I've got 3 bit values in the client table so at the moment it's only taking up one byte. But it means that it's not having to go and query the associated client payments table in order to find out. Given that any user could have a few hundred or even thousands of clients - often returned in one go, is the performance improvement of not querying another table as part of the select worth the potential of taking up one extra bit of space in the client table (given we currently have ~200,000 and I would imagine will be over 500,000 in a couple of years)?
For me from an implementation perspective it doesn't really matter either way. I just want to provide the fastest possible database performance.
Hope this makes sense!
Thanks, Stefan