I have a table of transactional data, which are forecasts for the future. The same forecasts, identified by same date, type, location and product, are thus read in multiple times as the forecasts become more accurate as time goes by and are resent.
I would like to create a query that would group the transactions that are of the same type and for the same location, product and date and then select from these groups only the ones that have the newest updated timestamps.
The table has now hundreds of thousands of rows and as time goes by, millions, so a reasonably efficient solution would be appreciated :)
Example table:
date | location_code | product_code | quantity | type | updated_at
------------+------------------+---------------+----------+----------+------------
2013-02-04 | ABC | 123 | -26.421 | TRANSFER | 2013-01-12
2013-02-07 | ABC | 123 | -48.1 | SALE | 2013-01-10
2013-02-06 | BCD | 234 | -58.107 | SALE | 2013-01-11
2013-02-06 | BCD | 234 | -60 | SALE | 2013-01-10
2013-02-04 | ABC | 123 | -6.727 | TRANSFER | 2013-01-10
The desired result:
date | location_code | product_code | quantity | type | updated_at
------------+------------------+---------------+----------+----------+------------
2013-02-04 | ABC | 123 | -26.421 | TRANSFER | 2013-01-12
2013-02-07 | ABC | 123 | -48.1 | SALE | 2013-01-10
2013-02-06 | BCD | 234 | -58.107 | SALE | 2013-01-11
I tried for example:
SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at
FROM transactions t
INNER JOIN
(
SELECT MAX(updated_at) as max_updated_at
FROM transactions
GROUP BY product_code, location_code, type, date
) s on t.updated_at=max_updated_at;
But this seems to take ages and doesn't seem to work.
Thank you for the help!