0

I have a set of sorted line items. They are sorted first by ID then by Date:

| ID  | DESCRIPTION |   Date   |
| --- | ----------- |----------|
| 100 | Red         |2019-01-01|
| 101 | White       |2019-01-01|
| 101 | White_v2    |2019-02-01|
| 102 | Red_Trim    |2019-01-15|
| 102 | White       |2019-01-16|
| 102 | Blue        |2019-01-20|
| 103 | Red_v3      |2019-01-14|
| 103 | Red_v3      |2019-03-14|

I need to insert rows in a SQL Server table, which represents a project header, so that the first row for each ID provides the Description and Date in the destination table. There should only be one row in the destination table for each ID.

For example, the source table above would result in this at the destination:

| ID  | DESCRIPTION |   Date   |
| --- | ----------- |----------|
| 100 | Red         |2019-01-01|
| 101 | White       |2019-01-01|
| 102 | Red_Trim    |2019-01-15|
| 103 | Red_v3      |2019-01-14|

How do I collapse the source so that I take only the first row for each ID from source?

I prefer to do this with a transformation in SSIS but can use SQL if necessary. Actually, solutions for both methods would be most helpful.


This question is distinct from Trouble using ROW_NUMBER() OVER (PARTITION BY …) in that this seeks to identify an approach. The asker of that question has adopted one approach, of more than one available as identified by answers here. That question is about how to make that particular approach work.

Hadi
  • 36,233
  • 13
  • 65
  • 124
Trevor Reid
  • 3,310
  • 4
  • 27
  • 46
  • I added an answer, however record `102` has two records for the min(date). How would decide here that `Red_Trim` is the "Header" description. I see no way to discern that `Red_Trim` is more correct than `White` for record `102`. Every other answer here using window function `ROW_NUMBER` will suffer the same issue, but will hide the problem a bit. Depending on how sql server feels you will either get `White` or `Red_Trim` back from the sql. – JNevill Apr 18 '19 at 18:30
  • Ah, thanks @JNevill and others for pointing that out. That was inadvertent. I think in practice it would be a datetime. – Trevor Reid Apr 18 '19 at 18:34
  • I was hoping you would say that. The real trick is that however you establish your ordering will need to be present in either the `ORDER BY` clause of the window function (if you go that route) or be taken into account for the correlated subquery solution. – JNevill Apr 18 '19 at 18:37

4 Answers4

4

A correlated subquery will help here:

 SELECT *
 FROM yourtable t1
 WHERE [Date] = (SELECT min([Date]) FROM yourtable WHERE id = t1.id)
JNevill
  • 46,980
  • 4
  • 38
  • 63
4

You can use row_number() :

select t.*
from (select t.*, row_number() over (partition by id order by date) as seq
      from table t
     ) t
where seq = 1;
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
3

use first_value window function

select * from (select *,
first_value(DESCRIPTION) over(partition by id order by Date) as des,
row_number() over(partition by id order by Date) rn
from table
) a where a.rn =1
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
3

You can use the ROW_NUMBER() window function to do this. For example:

select *
from (
  select
    id, description, date,
    row_number() over(partition by id order by date) as rn
  from t
)
where rn = 1
The Impaler
  • 45,731
  • 9
  • 39
  • 76