0

I have a table with over 150k rows and about 40 columns. There are a lot of duplicates, most of the rows only differ by a date. I want to keep the rows with the latest date (some of the dates are in the future), where all the other columns are the same. I'm using Access database.

Roughly, the table looks like this:

Supplier code, Contract number, Incoterms, Currency, Price, First day of validity, Last day of validity
--------------------------
47650          2806751M         DDA        EUR       0.01   01/03/2010             31/12/2012
47650          2806751M         DDA        EUR       0.02   01/01/2013             31/12/2014
47650          2806751M         DDA        EUR       0.03   01/01/2015             31/12/2016
47650          2806751M         DDA        EUR       0.04   01/01/2017             31/12/2019

All these columns contain almost the same data, but the period of validity of the contract is the important variable, since it determines the price. That's why I want to keep the row that contains the latest date.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
Henrik
  • 1
  • 1
  • 4

1 Answers1

1

"all the other columns are the same": This translates into a GROUP BY over all other columns and a MAX on the date:

select all other columns,
   MAX(datecol)
from tab
group by all other columns
dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • The question states "most recent date" which may mean to ignore future dates. Then you would need `where datecol <= sysdate`. – TommCatt Jul 14 '14 at 02:17
  • @TommCatt: Actually, it's just the title that says "recent", the question's body is using the word "latest". Do both words equally have the connotation of "last before now" in English? – Andriy M Jul 14 '14 at 06:24
  • Anyway, @dnoeth, as the example now shows, you probably shouldn't group by "all other columns". The OP wants a specific column (Price) that goes with the latest date, so grouping must be by something else. (I've asked the OP to clarify.) – Andriy M Jul 14 '14 at 06:30
  • @Andriy M: The OP didn't say he wants the price of the latest date, "most of the rows only differ by a date" & "where all the other columns are the same". Maybe there will be some clarification... – dnoeth Jul 14 '14 at 08:41
  • All right, I may have read too much into it based on the example and some wording used. The question still seems very vague at this point. – Andriy M Jul 14 '14 at 08:54
  • @Andriy M, yes, in normal contexts, "latest" and "most recent" are synonymous. If a manager asked you for the latest sales figures or the most recent sales figures, he's asking for the same thing. But most "normal" contexts don't involve the future. If there is an application that uses future dates (projections, plans, schedules, etc.), then it is important to clarify any ambiguities that may arise. – TommCatt Jul 14 '14 at 18:02
  • Ok, some clarifications: I need to include future dates in the query. And something else: I need the price associated with the latest date. @dnoeth, could you tell me how I should modify the query for that? – Henrik Jul 16 '14 at 21:08