1

For a project, I need to keep track of a huge amount of variables. It needs to be logged, which user changes which variable etc. The table looks (simplified to one variable-type) like this (id is a surogate key to tell versioned entries apart):

id | variable_id | change_time | change_user | change_task | revision | value
1    1             some date     1             123           1          Hello World
2    2             some date     1             123           1          22.5
3    1             some date     2             456           2          Foo Bar

To retrieve the LATEST set of Revisions (here the entries 2 and 3) i use the solution described over here: Retrieving the last record in each group

The resulting query looks like this:

SELECT 
   v1.id,
   v1.value 
FROM 
  variable_history AS v1
LEFT JOIN 
  variable_hisotry AS v2 
ON 
  (
    v1.variable_id = v2.variable_id AND 
    v1.revision < v2.revision
  ) 
WHERE 
  v2.id IS NULL

Now i need to modify that query, so that the following rules apply:

  • Querying the Latest Variables should return row 2 and 3
  • Querying the latest Variables for task 456 should return row 2 and 3
  • Querying the latest Variables for task 123 should return row 1 and 2 (for historic purpose)

The tricky thing is, that not each Task is changing (updating) all the variables all the time, so i can't simple query for all variables of Task X. Instead i have to take variables for Task X, if there are any, but for all other variables the ones from ANY task < X BUT with the highest Revision. (And exclude Revisions of a later task)

For any case, the result shoud contain all variables. So in this small example, it always should have 2 rows.

Edit:


The Query

 SELECT 
       v1.id,
       v1.value 
    FROM 
      (SELECT * FROM variable_history WHERE change_task <= 123) AS v1
    LEFT JOIN 
      (SELECT * FROM variable_history WHERE change_task <= 123) AS v2 
    ON 
      (
        v1.variable_id = v2.variable_id AND 
        v1.revision < v2.revision
      ) 
    WHERE 
      v2.id IS NULL

Works as expected. However having about 10.000 Entrys daily i'm not happy with a double sub-select followed by a join...

http://sqlfiddle.com/#!2/cfda03/2

Edit: hmm, on the other side: each row of variable_history is also bound to an entity_id, and ofc. limiting the 2 sub selects to ONE entity in question will reduce the table size to sth. like 150 different variables... So shouldn't be a performance problem.

Community
  • 1
  • 1
dognose
  • 20,360
  • 9
  • 61
  • 107

1 Answers1

1
SELECT v.*
FROM   (
         SELECT   variable_id, MAX(revision) revision
         FROM     variable_history
         GROUP BY variable_id
       ) very_latest LEFT JOIN (
         SELECT   variable_id, MAX(revision) revision
         FROM     variable_history
         WHERE    change_task = ?
         GROUP BY variable_id
       ) task_latest USING (variable_id)
  JOIN variable_history v
    ON v.variable_id = very_latest.variable_id
   AND v.revision = COALESCE(task_latest.revision, very_latest.revision)
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • The fiddle comes quite close, but querying for task 123 should not contain any variables from a LATER task, if there is NO Revision from a prior task. See this modified Fiddle, where Variable 3 should just be contained in the first query. http://sqlfiddle.com/#!2/035a71/1 okay, thats no big deal i think, but see my edit about performance... Oh and thanks for the fiddle - was working on that too, but you were faster :0) – dognose Mar 18 '13 at 10:36