-3

I have an archive with master data. This table is updated once a week to store attributes for each customer.

Customer | Attribute 1 | Attribute 2 | Attribute n | DateTime

It happens that a customerID is removed from one week to the next one.

Now I would like to extract for each Customer (Distinct) the last row that has been populated in this table.

How would you do that with SQL?

Aaron
  • 329
  • 3
  • 15
  • _"Now I would like to extract for each Customer (Distinct) the last row that has been populated in this table."_ - how do you determine the "last populated row", exactly? SQL does not have any concept of records-on-disk or table-insertion-order or anything like that - if there isn't a column or expression that you can use then you're SOL. – Dai Jul 02 '23 at 05:52
  • I think the question has been addressed already, check https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql The basic idea is to group your records on `Customer`, then pick the max `DateTime` – MarcC Jul 02 '23 at 05:55
  • Don't you need a trvial MAX with GROUP BY? SELECT customer, MAX(datecolumn) FROM customers GROUP BY customer; Can be simply used as subquery in case further columns should also be selected. – Jonas Metzler Jul 02 '23 at 05:57
  • @Dai: Based on the last Column "DateTime". – Aaron Jul 02 '23 at 06:12
  • 1
    So... `SELECT Customer, MAX( "DateTime" ) AS LatestDateTime FROM myTable GROUP BY Customer ORDER BY LatestDateTime;` ? – Dai Jul 02 '23 at 06:14
  • 1
    Which dbms? (The answer might be product specific.) – jarlh Jul 02 '23 at 06:21
  • @jarlh: Databricks – Aaron Jul 02 '23 at 06:23

1 Answers1

0

You can use row_number() window funciton with common table expression to do that:

with cte as
(select customerID  , Attribute1 , Attribute2 , Attributen  , DateTime,
row_number()over(partition by customerID order by DateTime desc)rn
from customerArchive)
select customerID  , Attribute1 , Attribute2 , Attributen  , DateTime from cte where rn=1

If an extra column named rn is not a problem you can use *

with cte as
(select *,row_number()over(partition by customerID order by DateTime desc)rn
from customerArchive)
select * from cte where rn=1

Or you can again join the cte with customer table. It will only work if you don't have any duplicate record for a given DateTime. It will be slower

    with cte as
            (select *,row_number()over(partition by customerID order by DateTime desc)rn
            from customerArchive)
            select Customer.* from cte inner join Customer 
            on Customer.customerID=cte.customerID and Customer.DateTime=cte.DateTime
            where rn=1

*** As @nbk pointed out it’s never wise to use *. Mentioning column names might take some time but it will be much safer option.

  • Thanks a lot! Since the table contains like 150 columns > Any way to simplify the SELECT part? I would like to not name every column – Aaron Jul 02 '23 at 06:50
  • 2
    not naming the column is bad style and not recommended – nbk Jul 02 '23 at 08:01