What's the best way to have a calculated column that involves multiple tables in a transactional database?
Background
Our transactional MySQL database includes the fact tables like the following simplified model:
CREATE TABLE LU_PROJECT(
PROJECT_ID int,
PROJECT_DESC varchar(60)
);
CREATE TABLE F_PROJECT_BUDGET(
PROJECT_ID int,
BUDGET numeric(15, 2)
);
CREATE TABLE F_TASK_BUDGET(
TASK_ID int,
BUDGET numeric(15,2)
);
CREATE TABLE F_MONEY_USED(
REPORTED_TIME datetime,
TASK_ID int,
MONEY_USED numeric(15, 2)
);
Task
is a child of Project
. Task IDs are not unique, but project/task pairs are unique.
Requirements
We need to preserve the invariant that the total budget for all tasks for a project is <= the budget for a project.
In addition, we often need to run a query that returns a result set with the following columns:
PROJECT_ID
, PROJET_DESC
, PROJECT_BUDGET
, TASK_COUNT
, PROJECT_MONEY_USED
Issues
We're concerned about performance. The simplest solution would require updates to hit three fact tables:
- Check that adding to
F_MONEY_USED
wouldn't put us over the budget for the task inF_TASK_BUDGET
. - Check that adding the money wouldn't put us over the total budget for the project
- Write to
F_MONEY_USED
.
And the query we need to get stats will hit three fact tables (in addition to lookup tables, not in the model above):
- Join
LU_PROJECT
toPROJECT_BUDGET
fromF_PROJECT_BUDGET
group byPROJECT_ID
- Join
LU_PROJECT
toF_TASK_BUDGET
to getTASK_COUNT
group byPROJECT_ID
- Join
LU_PROJECT
to F_MONEY_USED
to getPROJECT_MONEY_USED
group byPROJECT_ID
- JOIN
LU_PROJECT
to the intermediate results above and get thePROJECT_DESC
as well.
The problem is that that's a lot of joining and both reads and writes will happen frequently.
Potential Solution
One solution we are considering is to add a PROJECT_MONEY_USED
field to F_PROJECT_BUDGET
that will be updated on writes to F_TASK_BUDGET
. This will slow down writes but speed up reads.
The solution will also introduce complexity and data integrity concerns, since the fact tables will no longer be "at the grain." That violates data warehousing principles, but I can't find out whether it's kosher for transactional databases.
The slowdown in writes may not be much of a concern if we can do optimistic rendering in the UI, but that introduces more complexity.
Other solutions considered
- For writes, we are considering using triggers to preserve the invariant.
- For reads, calculated columns looked promising, but they aren't allowed to hit multiple tables in MySQL.
- For reads, materialized views are probably not an option, the data needs to be fresh in real time.
Summary
Is there a better solution for doing our reads and writes in a safe, simple, and performant manner?