2

I've got a fairly straightforward (I thought) data request from a very long table in SQL Server.

The table TMLogs contains thousands of data samples from machinery sensors connected to generators. These generators are connected to sensors that provide a snapshot of data on key generator performance components: RPM, Temperature, Sample Time, etc. The data snapshots are taken every 5 seconds, then stored in a SQL Server database. The generators are in different time zones, and the clocks aren't very accurate (+/- 10) minutes.

The columns look something like this:

Generator Name | Generator State | Fuel Tank Reading | Output (watts) | Sample Time (YYYY-DD-MM HH:MM:SS) | Engine Temp (F°) | Output Descriptor|....

My goal is to retrieve only the most recent sample that each generator has given.

Selecting the TOP X values only gives the generators in the eastern time zone, or the ones that have system clocks that are ahead of official time.

So, given the criteria Generator Name, how would I retrieve the newest data sample that each generator has stored?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
MRBags1964
  • 21
  • 1
  • 3
    cant you simply `select max([sample time]), [generator name] from tmlogs group by [generator name]` ? – Alex K. Dec 18 '15 at 16:15
  • Possible duplicate of [Select first row in each GROUP BY group?](http://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) – Tab Alleman Dec 18 '15 at 16:17
  • Is there some identifying key or id in the row? Even is composite, like does generator name and sample time uniquely identify a row? – asantaballa Dec 18 '15 at 16:18
  • Is this query supposed to refresh every five seconds? That might be a factor in how you approach this. – shawnt00 Dec 18 '15 at 16:20
  • @shawnt00 - This query will not need to refershed that often. The table is updated that often, however the output of the query will only be updated every 5 minutes. – MRBags1964 Dec 18 '15 at 16:41
  • @AlexK. - Let me give that a shot. Certainly sounds logical. – MRBags1964 Dec 18 '15 at 16:43
  • @asantaballa - There is not a unique identifier for each row of data. There are foreign keys to relate this table to other tables. – MRBags1964 Dec 18 '15 at 16:45
  • @AlexK. - Using the `MAX(Sample Time)` and `Group By` `Generator Name` works, but only my query contains those columns. If I add an additional column, such as Temp°, the error message "Temp is invalid in the select list because it is not contained in either an aggregate function or the Group By clause". – MRBags1964 Dec 18 '15 at 17:03

2 Answers2

3

Here are two approaches you can take. Both use the sample data defined here:

-- Let's create a sample table to experiment with.
DECLARE @Sample TABLE
    (
        GeneratorId     INT,
        SampleTime      DATETIME,
        Measure1        INT         
    )
;

-- Populate it.
INSERT INTO @Sample
    (
        GeneratorId,
        SampleTime,
        Measure1
    )
VALUES
    (1, '2015-01-01 09:00:00.000', 100),
    (1, '2015-01-01 09:05:00.000', 90),
    (2, '2015-09-01 10:00:00.000', 100),
    (2, '2015-09-01 10:05:00.000', 90)
;

This first example uses ROW_NUMBER. This windowed function is great for sequential numbering records. In this case the most recent sample for each generator is assigned a 1.

/* The inner query ranks our records.
 * The outer query filters for the most recent.
 * Windowed functions cannot be used in the where clause, hence the need for a subquery.
 */
SELECT
    *
FROM
    (
        /* ROW_NUMBER assigns a rank to each record.
         * Each GeneratorId is ranked separately, thanks 
         * to the PARTITION BY.
         */
        SELECT
            ROW_NUMBER() OVER(PARTITION BY GeneratorId ORDER BY SampleTime) AS RowNum,
            GeneratorId,
            SampleTime,
            Measure1
        FROM
            @Sample
    ) AS r
WHERE
    RowNum = 1
;

Or you could use a CTE. This allows us to first find the most recent sample, by generator. Then filter for just those.

/* We can also use a CTE to achive the same result.
 * You could just as easily make the CTE a subquery
 * in the main body.
 */
WITH MostRecent AS
    (
        /* This CTE finds the most recent sample for each
         * generator.  This is used to filer the main query.
         */
        SELECT
            GeneratorId,
            MAX(SampleTime) AS MaxSampleTime
        FROM
            @Sample
        GROUP BY
            GeneratorId
    )
SELECT
    *
FROM
    @Sample AS s
        INNER JOIN MostRecent AS mr     ON  mr.GeneratorId      = s.GeneratorId
                                        AND mr.MaxSampleTime    = s.SampleTime
;

This 2nd example could be reformatted as a subquery.

David Rushton
  • 4,915
  • 1
  • 17
  • 31
1

I haven't fully tested this, but I'd try approaching it with something like this.

SELECT b.*
FROM
(
select max([sample time]), 
[generator name] from tmlogs 
group by [generator name]
) a
inner join tmlogs b
on a.time = b.[sample time] and a.[generator name] = b.[generator name]

Basically if you can get the results you want (first query, aliased as "a"), you can then just join the main table (aliased as "b"), on whatever columns from "a".

Eric
  • 2,273
  • 2
  • 29
  • 44
  • Appreciate it! I won't be able to try until Monday, but I have a feeling this will work, given the results I got using the `Select MAX()` before. – MRBags1964 Dec 19 '15 at 16:52
  • @MRBags1964 terrific. If you find answers on here useful, feel free to upvote them, and if they solve your problem mark them as the answer. – Eric Dec 19 '15 at 16:57
  • Will do! Upvotes for all! I've browsed Stack before, but never submitted a question. I can't emphasize how important and helpful online communities like this are! – MRBags1964 Dec 20 '15 at 17:42