1

I have a table (Postgres 9.3) defined as follows:

 CREATE TABLE tsrs (
     id SERIAL PRIMARY KEY,
     customer_id INTEGER NOT NULL REFERENCES customers,
     timestamp TIMESTAMP WITHOUT TIME ZONE,
     licensekeys_checksum VARCHAR(32));

The pertinent details here are the customer_id, the timestamp, and the licensekeys_checksum. There can be multiple entries with the same customer_id, some of those may have matching licensekey_checksum entries, and some may be different. There will never be rows with equal checksum and equal timestamps.

I want to return a table containing 1 row for each group of rows with matching licensekeys_checksum entries. The row returned for each group should be the one with the newest / most recent timestamp.

Sample Input:

1, 2, 2014-08-21 16:03:35, 3FF2561A
2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
4, 5, 2013-12-13 08:14:26, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891

Desired Output:

2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891

I have managed to piece together a query based on the comments below, and hours of searching on the internet. :)

select * from tsrs
inner join (
   select licensekeys_checksum, max(timestamp) as mts
   from tsrs
   group by licensekeys_checksum
   ) x on x.licensekeys_checksum = tsrs.licensekeys_checksum
      and x.mts = tsrs.timestamp;

It seems to work, but I am unsure. Am I on the right track?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
JBeFat
  • 907
  • 10
  • 20
  • Use GROUP BY and MAX. – jarlh Jan 15 '15 at 10:13
  • 1
    Are there records with equal checksum **and** equal timestamps ? (BTW: `timestamp` is a bad name for a column) – joop Jan 15 '15 at 10:16
  • @joop There will never be records with equal checksum and equal timestamps. Why is timestamp a bad name for a column? I'm quite new to all this, so would appreciate any knowledge you have to give :) – JBeFat Jan 15 '15 at 10:27
  • 2
    Because it is also the name of a data type. This could confuse parsers, frameworks, or people. – joop Jan 15 '15 at 10:39
  • TIMESTAMP is a reserved word in ANSI SQL, so it won't work with many dbms products - unless you double quote it, as "timestamp". (And since it's a reserved word, it may also be reserved in a future version of postgresql, then your applications will no longer work.) – jarlh Jan 15 '15 at 10:52

3 Answers3

5

Your query in the question should perform better than the queries in the (previously) accepted answer. Test with EXPLAIN ANALYZE.

DISTINCT ON is typically simpler and faster:

SELECT DISTINCT ON (licensekeys_checksum) *
FROM   tsrs
ORDER  BY licensekeys_checksum, timestamp DESC NULLS LAST;

db<>fiddle here
Old sqlfiddle

Detailed explanation:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2

Try this

select * 
from tsrs
where (timestamp,licensekeys_checksum) in (
                                          select max(timestamp)
                                                ,licensekeys_checksum
                                          from tsrs 
                                          group by licensekeys_checksum) 

>SqlFiddle Demo

or

with cte as (
            select id
                   ,customer_id
                   ,timestamp
                   ,licensekeys_checksum
                   ,row_number () over (partition by  licensekeys_checksum  ORDER BY timestamp DESC) as rk
            from  tsrs)
select  id
       ,customer_id
       ,timestamp
       ,licensekeys_checksum  
from cte where rk=1 order by id

>SqlFiddle Demo


Reference : Window Functions, row_number(), and CTE

Vivek S.
  • 19,945
  • 7
  • 68
  • 85
  • 1
    This returns the same results as my query above, but is far more understandable (for me at least :). I will use this version instead. Many thanks! – JBeFat Jan 15 '15 at 11:21
2

Alternative deduplication, using NOT EXISTS(...)

SELECT *
FROM tsrs t
WHERE NOT EXISTS (
    SELECT *
    FROM tsrs x
    WHERE x.customer_id = t.customer_id                  -- same customer
    AND x.licensekeys_checksum = t.licensekeys_checksum  -- same checksum
    AND x.ztimestamp > t.ztimestamp                      -- but more recent
    );
joop
  • 4,330
  • 1
  • 15
  • 26