You have a few competing factors here:
You should use a SmallDateTime
, DateTime2
or DateTimeOffset
typed columns to store the actual time in the log, these types allow for calculating the differece between values using DateDiff()
and DateAdd()
and other date/time based comparison logic, where as Timestamp
is designed to be used as a currency token, you can use it to determine if one record is more recent than another, you shouldn't try to use it to determine the actual time of the event.
You have not explained the expected workflow, we can only assume that the flow is [OPEN]=>[In Progress]=>[CLAIM FIXED]. There is also no mention of 'In Progress', which we assume is an interim state. What actually happens here is that this structure can really only tell you the time spent in the 'In Progress' state, which is probably OK for your needs as this is the time spent actually working, but it is important to recognise that we do not know when the bug is changed to 'OPEN' in the first place, unless that is also logged but we need to see the data to explain that.
Your example dataset does not cover enough combinations for you to notice that the existing logic will fail as soon as you add more than 1 bug. What is more you have asked to calculate the number of hours, but your example data only shows a variation minutes and has no example where the bug is completed at all.
- Without a realistic set of data to test with, you will find it hard to debug your logic and hard to accept that it actually works before you execute this against a larger dataaset. It can help to have a scripted scenario, much like your post here, but you should create the data to reflect that script.
- You use
'FIXED'
in your example, but 'CLAIM FIXED'
in query, so which one is it?
Step 1: Structure
Change the datatype of CurrentTime
to a DateTime based column. Your application logic may drive requirements here. If your system is cloud based or international, then you may see benefits from using DateTimeOffset
instead of having to convert into UTC, otherwise if you do not need high precision timing in your logs, it is very common to use SmallDateTime
for logging.
- Many ORM and application frameworks will allow you to configure a DateTime based column as the concurrency token, it you need one at all. If you are not happy using a lower precision value for concurrency, then you could have the two columns side by side, to compare the time difference between two records, we need to use a DateTime based type.
- In the case of log, we rarely allow or expect logs to be edited, if your logs are read-only then having a concurrency token at all may not be necessary, especially if you only use the concurrency token to determine concurrency during edits of individual records.
NOTE: You should consider using an enum or FK for the Status concept. Already in your example dataset there was a typo for 'In Progerss'
, using a numeric comparison for the status may provide some performance benefits but it will help to prevent spelling mistakes, especially when FK or lookup lists are used from any application logic.
Step 2: Example Data
If the requirement is to calculate the number of hours spent between records, then we need to create some simple examples that show a difference of a few hours, and then add some examples where the same bug is opened, fixed and then re-opened.
- bug #1 was fixed in 2 hours on 2020-01-01, then reopened and got fixed in 3 hours on 2020-12-12
The following table shows the known data states and the expected hrs, we need to throw in a few more data stories to validate that the end query handles obvious boundary conditions like multiple Bugs and overlapping dates
BUG # |
Time |
Previous State |
New State |
Hrs In Progress |
1 |
2020-01-01 08:00:00 |
OPEN |
In Progress |
|
1 |
2020-01-01 10:00:00 |
In Progress |
FIXED |
(2 hrs) |
1 |
2020-12-10 09:00:00 |
FIXED |
OPEN |
|
1 |
2020-12-12 9:30:00 |
OPEN |
In Progress |
|
1 |
2020-12-12 12:30:00 |
In Progress |
FIXED |
(3 hrs) |
2 |
2020-03-17 11:15:00 |
OPEN |
In Progress |
|
2 |
2020-03-17 14:30:00 |
In Progress |
FIXED |
(3.25 hrs) |
3 |
2020-08-22 10:00:00 |
OPEN |
In Progress |
|
3 |
2020-08-22 16:30:00 |
In Progress |
FIXED |
(6.5 hrs) |
Step 3: Query
What is interesting to notice here is that 'In Progress'
is actually the significant state to query against. What we actually want is to see all rows where the OldStatus
is 'In Progress'
and we want to link that row to the most recent record before this one with the same BugID
and with a NewStatus
equal to 'In Progress'
What is interesting in the above table is that not all the expected hours are whole numbers (integers) which makes using DateDiff
a little bit tricky because it only counts the boundary changes, not the total number of hours. to highlight this, look at the next two queries, the first one represents 59 minutes, the other only 2 minutes:
SELECT DateDiff(HOUR, '2020-01-01 08:00:00', '2020-01-01 08:59:00') -- 0 (59m)
SELECT DateDiff(HOUR, '2020-01-01 08:59:00', '2020-01-01 09:01:00') -- 1 (1m)
However the SQL results show the first query as 0
hours, but the second query returns 1
hour. That is because it only compares the HOUR
column, it is not actually doing a subtraction of the time value at all.
To work around this, we can use MINUTE
or MI
as the date part argument and divide the result by 60.
SELECT CAST(ROUND(DateDiff(MI, '2020-01-01 08:00:00', '2020-01-01 08:59:00')/60.0,2) as Numeric(10,2)) -- 0.98
SELECT CAST(ROUND(DateDiff(MI, '2020-01-01 08:59:00', '2020-01-01 09:01:00')/60.0,2) as Numeric(10,2)) -- 0.03
You can choose to format this in other ways by calculating the modulo to get the minutes in whole numbers instead of a fraction but that is out of scope for this post, understanding the limitations of DateDiff
is what is important to take this further.
There are a number of ways to correlate a previous record within the same table, if you need other values form the record then you might use a join with a sub-query to return the TOP 1
from all the records before the current one, you could use window queries or a CROSS APPLY
to perform a nested lookup. The following uses CROSS APPLY
which is NOT standard across all RDBMS but I feel it keeps MS SQL queries really clean:
SELECT [Fixed].BugID, [start_time], [Fixed].[CurrentTime] as [finish_time]
, DATEDIFF(MI, [start_time], [Fixed].[CurrentTime]) / 60 AS Time_Spent_Hr
, DATEDIFF(MI, [start_time], [Fixed].[CurrentTime]) % 60 AS Time_Spent_Min
FROM Log as Fixed
CROSS APPLY (SELECT MAX(CurrentTime) AS start_time
FROM Log as Started
WHERE Fixed.BugID = Started.BugID
AND Started.NewStatus = 'In Progress'
AND CurrentTime < Fixed.CurrentTime) as Started
WHERE Fixed.OldStatus = 'In Progress'
You can play with this fiddle: http://sqlfiddle.com/#!18/c408d4/3
However the results show this:
BugID |
start_time |
finish_time |
Time_Spent_Hr |
Time_Spent_Min |
1 |
2020-01-01T08:00:00Z |
2020-01-01T10:00:00Z |
2 |
0 |
1 |
2020-12-12T09:30:00Z |
2020-12-12T12:30:00Z |
3 |
0 |
2 |
2020-03-17T11:15:00Z |
2020-03-17T14:30:00Z |
3 |
15 |
3 |
2020-08-22T10:00:00Z |
2020-08-22T16:30:00Z |
6 |
30 |