1

What's the best way to have a calculated column that involves multiple tables in a transactional database?

Background

Our transactional MySQL database includes the fact tables like the following simplified model:

CREATE TABLE LU_PROJECT(
    PROJECT_ID int,
    PROJECT_DESC varchar(60)
);

CREATE TABLE F_PROJECT_BUDGET(
    PROJECT_ID int,
    BUDGET numeric(15, 2)
);

CREATE TABLE F_TASK_BUDGET(
    TASK_ID int,
    BUDGET numeric(15,2)
);

CREATE TABLE  F_MONEY_USED(
    REPORTED_TIME datetime,
    TASK_ID int,
    MONEY_USED numeric(15, 2)
);

Task is a child of Project. Task IDs are not unique, but project/task pairs are unique.

Requirements

We need to preserve the invariant that the total budget for all tasks for a project is <= the budget for a project.

In addition, we often need to run a query that returns a result set with the following columns:

PROJECT_ID, PROJET_DESC, PROJECT_BUDGET, TASK_COUNT, PROJECT_MONEY_USED

Issues

We're concerned about performance. The simplest solution would require updates to hit three fact tables:

  1. Check that adding to F_MONEY_USED wouldn't put us over the budget for the task in F_TASK_BUDGET.
  2. Check that adding the money wouldn't put us over the total budget for the project
  3. Write to F_MONEY_USED.

And the query we need to get stats will hit three fact tables (in addition to lookup tables, not in the model above):

  1. Join LU_PROJECT to PROJECT_BUDGET from F_PROJECT_BUDGET group by PROJECT_ID
  2. Join LU_PROJECT to F_TASK_BUDGET to get TASK_COUNT group by PROJECT_ID
  3. Join LU_PROJECT to F_MONEY_USED to get PROJECT_MONEY_USED group by PROJECT_ID
  4. JOIN LU_PROJECT to the intermediate results above and get the PROJECT_DESC as well.

The problem is that that's a lot of joining and both reads and writes will happen frequently.

Potential Solution

One solution we are considering is to add a PROJECT_MONEY_USED field to F_PROJECT_BUDGET that will be updated on writes to F_TASK_BUDGET. This will slow down writes but speed up reads.

  • The solution will also introduce complexity and data integrity concerns, since the fact tables will no longer be "at the grain." That violates data warehousing principles, but I can't find out whether it's kosher for transactional databases.

  • The slowdown in writes may not be much of a concern if we can do optimistic rendering in the UI, but that introduces more complexity.

Other solutions considered

  • For writes, we are considering using triggers to preserve the invariant.
  • For reads, calculated columns looked promising, but they aren't allowed to hit multiple tables in MySQL.
  • For reads, materialized views are probably not an option, the data needs to be fresh in real time.

Summary

Is there a better solution for doing our reads and writes in a safe, simple, and performant manner?

Max Heiber
  • 14,346
  • 12
  • 59
  • 97
  • Can you not have a stored procedure that does the check and only updates if it's good? With decent indexes I wouldn't expect the joins to be that slow. Key point here is which performance are you worried about - client UI responsiveness or sheer load of multiple requests on the server? – Matt Allwood Jul 23 '15 at 16:49
  • @MattAllwood Thanks! We're most concerned about UI responsiveness. Yep, we plan to use Stored Procedures. – Max Heiber Jul 24 '15 at 15:45
  • I wouldn't be too worried about the performance unless it becomes a big issue, this stuff is pretty bread-and-butter for a DB so it should cope. Protect against over-spend by only updating via an stored proc that checks budget before updating. Not familiar with MySQL, but can you calculate from a view? Restricting the number of records returned (and needing to be calculated) by additional WHERE clauses can be added later if needed – Matt Allwood Jul 27 '15 at 12:34
  • @MattAllwood Thanks for your suggestions. MySQL has views, but I don't think they help with performance: the same SQL gets run against the database, it just simplifies how the queries look. Materializing the views isn't an option, even if MySQL can do it, because we need this to be soft real-time. – Max Heiber Jul 27 '15 at 13:48
  • Is the view that slow? Have you got correct indexes on the base tables? Possibly try restricting the data you return if possible with additional WHERE clauses. Unless you're returning a LOT of data, I wouldn't expect performance to be that big an issue with this system – Matt Allwood Jul 28 '15 at 10:03

0 Answers0