For a project, I need to keep track of a huge amount of variables. It needs to be logged, which user changes which variable etc. The table looks (simplified to one variable-type) like this (id is a surogate key to tell versioned entries apart):
id | variable_id | change_time | change_user | change_task | revision | value
1 1 some date 1 123 1 Hello World
2 2 some date 1 123 1 22.5
3 1 some date 2 456 2 Foo Bar
To retrieve the LATEST set of Revisions (here the entries 2 and 3) i use the solution described over here: Retrieving the last record in each group
The resulting query looks like this:
SELECT
v1.id,
v1.value
FROM
variable_history AS v1
LEFT JOIN
variable_hisotry AS v2
ON
(
v1.variable_id = v2.variable_id AND
v1.revision < v2.revision
)
WHERE
v2.id IS NULL
Now i need to modify that query, so that the following rules apply:
- Querying the Latest Variables should return row 2 and 3
- Querying the latest Variables for task 456 should return row 2 and 3
- Querying the latest Variables for task 123 should return row 1 and 2 (for historic purpose)
The tricky thing is, that not each Task is changing (updating) all the variables all the time, so i can't simple query for all variables of Task X. Instead i have to take variables for Task X, if there are any, but for all other variables the ones from ANY task < X BUT with the highest Revision. (And exclude Revisions of a later task)
For any case, the result shoud contain all variables. So in this small example, it always should have 2 rows.
Edit:
The Query
SELECT
v1.id,
v1.value
FROM
(SELECT * FROM variable_history WHERE change_task <= 123) AS v1
LEFT JOIN
(SELECT * FROM variable_history WHERE change_task <= 123) AS v2
ON
(
v1.variable_id = v2.variable_id AND
v1.revision < v2.revision
)
WHERE
v2.id IS NULL
Works as expected. However having about 10.000 Entrys daily i'm not happy with a double sub-select followed by a join...
http://sqlfiddle.com/#!2/cfda03/2
Edit: hmm, on the other side: each row of variable_history is also bound to an entity_id, and ofc. limiting the 2 sub selects to ONE entity in question will reduce the table size to sth. like 150 different variables... So shouldn't be a performance problem.