I have a PostgreSQL database that contains traffic tickets written by a few jurisdictions.
Some jurisdictions don't indicate if multiple tickets are written in one traffic stop. However, that can be inferred by analyzing other fields. Consider this data:
ticket_id timestamp drivers_license
----------------------------------------------
1 2008-08-07 01:51:00 11111111
2 2008-08-07 01:51:00 11111111
3 2008-08-07 02:02:00 22222222
4 2008-08-07 02:25:00 33333333
5 2008-08-07 04:23:00 44444444
6 2008-08-07 04:23:00 55555555
7 2008-08-07 04:23:00 44444444
I can infer that:
- Tickets 1 & 2 were written in a single traffic stop because they share driver's license numbers and timestamps.
- Same for 5 & 7, but notice how ticket 6 is between them. Perhaps another officer was writing a ticket at the same time somewhere else, or data entry operators entered stuff out of order.
I would like to add another column that has a unique ID for each traffic stop. It will not be a primary key for the table because it will have duplicate values. For example:
ticket_id timestamp drivers_license stop_id
--------------------------------------------------------
1 2008-08-07 01:51:00 11111111 1
2 2008-08-07 01:51:00 11111111 1
3 2008-08-07 02:02:00 22222222 2
4 2008-08-07 02:25:00 33333333 3
5 2008-08-07 04:23:00 44444444 4
6 2008-08-07 04:23:00 55555555 5
7 2008-08-07 04:23:00 44444444 4
I can think of computationally-intensive, greedy algorithm ways of doing this with C#, but is there an efficient SQL query that can work?