0

In this example, I am collecting some engine data on a car.

Variables
--------------------------------------
id  | name
--------------------------------------
1     Headlights On 
2     Tire Pressure
3     Speed
4     Engine Runtime in Seconds
...


Values
--------------------------------------
id  |  var_id  | value  | time
--------------------------------------
1      1          1       2013-05-28 16:42:00.100
2      1          0       2013-05-28 16:42:22.150
3      2          32.0    2013-05-28 16:42:22.153
4      3          65      2013-05-28 16:42:22.155
...

I want to write a query that returns a result set something like the following:

Input: 1,2,3


Time                | Headlights On | Tire Pressure | Speed
---------------------------------------------------------------
2013-05-28 16:42:00   1                                  
2013-05-28 16:42:22   0               32              65

Being able to modify the query to include only results for a given set of variables and at a specified interval say (1 second, 1 minute or 5 minutes) are also really important for my use case.

How do you write a query in T-SQL that will return a time-aggregated multi column result set at a specific interval?

Chris G.
  • 3,963
  • 2
  • 21
  • 40
  • Righting a query that returns a "time-aggregated multi-column result set at a specific interval" is not a problem. Writing a query that has variable column names *is* a problem. Doing what you want requires dynamic SQL, because the column names are stored in a separate table. Is that really a requirement? – Gordon Linoff Jun 03 '13 at 21:17
  • Figured it might. I could generate a multi column table from the result set with a client application, but I didn't want to go there yet as I haven't started working on my client application yet. Just trying to get a better sense of whats in the database while I test my little OBD reader device. – Chris G. Jun 03 '13 at 21:42

2 Answers2

1

Try

SELECT
    VehicleID
    , Case WHEN Name = 'Headlights on' THEN 1
     Else 0 END ' as [Headlights on]
    , Case WHEN Name = 'Tyre pressure' THEN Value
     Else CAST( NULL AS REAL) END ' as [Tyre pressure]
 , DateName(Year, DateField) [year ]
FROM
   Table

ETC

Then agrregate as required

SELECT
      VehicleID
      , SUM([Headlights on]) SUM([Headlights on],
FROM 
(
QUery above

) S
GROUP BY
VehicleID
, [Year]
Ian P
  • 1,724
  • 1
  • 10
  • 12
  • This kind of gave me some kind of clue but isn't really what I'm looking for. Looking at the tables in my example, I'm recording values over time for a set of variables. I don't want to sum up the truthfulness of Headlights_On for a time period, I want to display the values of a provided set of variables in a crosstab-like format over a time interval. – Chris G. Jun 03 '13 at 19:43
  • Well there are quite a few agregate functions, if you want 1 where its true (even multiple times) then just use MAX(binary_column) instead or MAX(NumericColumnEgTyrePressure). Provided you group by (VEhicleId and TimePeriod) then this is exactly what you ask for as I understand your question – Ian P Jun 04 '13 at 08:22
1

1 minute aggregate:

SELECT {edit: aggregate functions over fields here} FROM Values WHERE {blah} GROUP BY DATEPART (minute, time);

5 minute aggregate:

SELECT {edit: aggregate functions over fields here} FROM Values WHERE {blah} GROUP BY
  DATEPART(YEAR, time),
  DATEPART(MONTH, time),
  DATEPART(DAY, time),
  DATEPART(HOUR, time),
  (DATEPART(MINUTE, time) / 5);

For the reason this latter part is so convoluded, please see the SO post here: How to group time by hour or by 10 minutes .

Edit 1: For the part "include only results for a given set of variables", my interpretation is that you want to to isolate Values with var_id being within a specified set. If you can rely on the variable numbers/meanings not changing, the common SQL solution is the IN keyword (http://msdn.microsoft.com/en-us/library/ms177682.aspx).

This is what you would put into the WHERE clause above, e.g.

... WHERE var_id IN (2, 4) ...

If you can't rely on knowing the variable numbers but are certain about their names, you can replace the set by a sub-query, e.g.:

... WHERE var_id IN (SELECT id FROM Variables WHERE name IN ('Tire Pressure','Headlights On')) ...

The alternative interpretation is that you actually want to aggregate based on the variable ids as well. In this case, you'll have to include the var_id in your GROUP BY clause.

To make the results more crosstab-like, I guess you'll want to order by time aggregate that you're using. Hope that helps more.

Community
  • 1
  • 1
Greg Kramida
  • 4,064
  • 5
  • 30
  • 46
  • This tells me how to set up a time interval, thanks for the link in the post. It's helpful. I still don't know how to set up the crosstab-like result set in my question, though. It's not as simple as `WHERE {blah}`, I think. – Chris G. Jun 03 '13 at 19:44
  • 1
    Perhaps my edit will help. Still not sure if my interpretation of what you're looking for is correct, though. – Greg Kramida Jun 03 '13 at 20:01
  • This got me in the right direction. Still learning MSSQL. This SQL query gives me an error. Since we're grouping by time, the `SELECT *` will yield the 'is not contained in aggregate function` error. Thing is, I don't want to contain it... max(), min(), etc are not relevant here - I just want the actual value for the row with that time. – Chris G. Jun 04 '13 at 13:20
  • 1
    Yes, that's true, you're absolutely right. You'll need to use some aggregate functions on each field you want except for the one's you're aggregating on, i.e. AVG(value), MIN(var_id) and the like. The rows won't have direct correspondence to the rows in the original table, so it doesn't make much sense to aggregate the "id" column. – Greg Kramida Jun 04 '13 at 13:30
  • This really screws up my logged data though. I want the exact value of Headlights_On and Speed at 5 minute intervals. I don't want the average or the minimum value nor do I care what happens in between the 5 minute intervals... – Chris G. Jun 04 '13 at 13:36
  • Here's a pastie. http://pastebin.com/et7QZrkt you'll see the first 6 rows don't have a 5 minute interval but the subsequent ones do. Also the MAX(value) isn't actually giving me the maximum recorded value in that time period, not sure why. – Chris G. Jun 04 '13 at 13:47
  • 1
    I'm guessing there was no data past 13:20 at the time of query, which would mean that the 13:19 aggregate is actually for 13:15, 13:16, 13:17, 13:18, and 13:19. The 13:14 chunk seems also correct unless I'm missing something, and the 13:20 is just the truncated time segment with one minute only. – Greg Kramida Jun 04 '13 at 14:13
  • Understand about the first chunk now. Looks like its grouping correctly then, but the values aren't being MAX()'d correctly and really I don't even want the MAX value in those 5 minutes, i want the actual value at the 5 minute mark. – Chris G. Jun 04 '13 at 14:15
  • I see. That would make the query a bit more complex, since then you'd have to subquery on the times only, i.e.: SELECT * FROM Values WHERE time IN (SELECT {aggregated time value} GROUP BY {aggregated time value}) AND ... ;. That would make the query skip the values not occurring at precisely the 5-minute-intervals (you could have some intervals missing if there isn't a value exactly at the 5-minute-mark.) Also not sure if MAX(time) is going to cut it, since the "/5" actually truncates the minute value, doesn't take the ceiling. Maybe, MIN(time) would work better, not sure. – Greg Kramida Jun 04 '13 at 14:21
  • Also it would be hard to control where your intervals occur, i.e. say the very first value it looks at is 1 minute away from the next, while the subsequent ones all have 5-minute offset from each-other. That would actually make the query miss a whole bunch of data. I'm trying to think of a "sequence" way to do this, perhaps that would work better. – Greg Kramida Jun 04 '13 at 14:23
  • The problem with sequences is that unless you have exact information about when the values are supposed to occur, e.g. you have the exact offset at which data is collected and the start of the count, you still don't know if the dates generated by a date sequence would hit upon the needed values. If you do have that information, you can manipulate your time aggregate by adding a fixed minute offset, without unduly complicating things. I hope this helps somewhat. – Greg Kramida Jun 04 '13 at 14:28