0

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.

I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;

| ID |     DATE   | CODE | PROG|            
--------------------------------
| 1  | 1996-08-16 | 24   |  A  |
| 1  | 1997-06-02 | 123  |  A  |
| 1  | 1997-06-02 | 123  |  B  |
| 1  | 1997-06-02 | 211  |  B  |
| 1  | 1997-08-19 |  67  |  A  |
| 1  | 1997-08-19 |  23  |  A  |

So my desired output would look like this;

| ID |     DATE   | CODE | PROG|            
--------------------------------
| 1  | 1996-08-16 | 24   |  A  |
| 1  | 1997-06-02 | 123  |  B  |
| 1  | 1997-08-19 |  23  |  A  |

I'm struggling to come up with a solution to this, so any help greatly appreciated!

Microsoft SQL Server 2012 (X64)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • I'm partial to using a window function and partitioning by the columns you deem necessary. Here's a link, bottom of the article describes the technique. Might get you most of the way there. https://www.simple-talk.com/sql/t-sql-programming/removing-duplicates-from-a-table-in-sql-server/ – Francisco Garcia Feb 10 '16 at 16:13
  • 1
    Just the dates? What if you have different row IDs but the same date? – johnny Feb 10 '16 at 16:14
  • Good point, only duplicate dates for each unique id. So there will still be duplicates but not within each unique id if that makes sense – Jules.Sanchez Feb 10 '16 at 16:23
  • You might want to look here, http://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table?rq=1 – johnny Feb 10 '16 at 16:26
  • I'd do this using a windowing function and cte such as [HERE](http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/) – SQLChao Feb 10 '16 at 16:28

2 Answers2

0

You can use min() function: (See the details here)

select ID, DATE, min(CODE), max(PROG)
from table
group by DATE

I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

inverted_index
  • 2,329
  • 21
  • 40
  • i don't think this handles the situation "In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B" – Stidgeon Feb 10 '16 at 16:52
0

The following works with your test data

SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date

You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.

SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5

Stidgeon
  • 2,673
  • 8
  • 20
  • 28