SQL deleting rows with duplicate dates conditional upon values in two columns

Question

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.

I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;

| ID |     DATE   | CODE | PROG|            
--------------------------------
| 1  | 1996-08-16 | 24   |  A  |
| 1  | 1997-06-02 | 123  |  A  |
| 1  | 1997-06-02 | 123  |  B  |
| 1  | 1997-06-02 | 211  |  B  |
| 1  | 1997-08-19 |  67  |  A  |
| 1  | 1997-08-19 |  23  |  A  |

So my desired output would look like this;

| ID |     DATE   | CODE | PROG|            
--------------------------------
| 1  | 1996-08-16 | 24   |  A  |
| 1  | 1997-06-02 | 123  |  B  |
| 1  | 1997-08-19 |  23  |  A  |

I'm struggling to come up with a solution to this, so any help greatly appreciated!

Microsoft SQL Server 2012 (X64)

I'm partial to using a window function and partitioning by the columns you deem necessary. Here's a link, bottom of the article describes the technique. Might get you most of the way there. https://www.simple-talk.com/sql/t-sql-programming/removing-duplicates-from-a-table-in-sql-server/ — Francisco Garcia, Feb 10 '16 at 16:13
Just the dates? What if you have different row IDs but the same date? — johnny, Feb 10 '16 at 16:14
Good point, only duplicate dates for each unique id. So there will still be duplicates but not within each unique id if that makes sense — Jules.Sanchez, Feb 10 '16 at 16:23
You might want to look here, http://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table?rq=1 — johnny, Feb 10 '16 at 16:26
I'd do this using a windowing function and cte such as [HERE](http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/) — SQLChao, Feb 10 '16 at 16:28

inverted_index · Answer 1 · 2016-02-10T17:18:58.680

0

You can use min() function: (See the details here)

select ID, DATE, min(CODE), max(PROG)
from table
group by DATE

I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

edited Feb 10 '16 at 17:18

answered Feb 10 '16 at 16:28

inverted_index

2,329
21
40

i don't think this handles the situation "In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B" – Stidgeon Feb 10 '16 at 16:52

Stidgeon · Accepted Answer · 2016-02-10T17:12:13.557

0

The following works with your test data

SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date

You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.

SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5

edited Feb 10 '16 at 17:12

answered Feb 10 '16 at 16:56

Stidgeon

2,673
8
20
28

Cheers! Thanks didn't think about using MAX(prog) as it was string. – Jules.Sanchez Feb 11 '16 at 12:12

SQL deleting rows with duplicate dates conditional upon values in two columns

2 Answers2