how to delete duplicate rows from query without any unique identifier

Question

I am currently trying to update a table which contains a large number of duplicate rows, but contains no unique identifier. I want to add a minus sign (-) before the te_system_ref of each duplicate, but leave the original. I have the following query that gives me the full list of duplicates (as shown below) but I can't figure out how I would go about only removing the duplicates.

Current query:

select te_system_ref, te_event, te_Date
from ticket_events
where te_system_ref in (select te_system_ref from ticket_events
                        where te_event = 'VQ5 hard copy follows'
                          and te_date between '04/12/17' and '18/06/18 23:59:59' 
group by te_system_ref
having count (te_event) > 4)
   and te_event = 'VQ5 hard copy follows'

So instead of:

1046338 2018-04-24 07:01:57.000  16210  VQ5 hard copy follows
1046338 2018-04-24 07:01:58.000  16210  VQ5 hard copy follows
1046338 2018-04-25 07:02:49.000  16210  VQ5 hard copy follows
1046338 2018-04-25 07:02:50.000  16210  VQ5 hard copy follows
1064317 2018-03-21 16:21:52.000  16210  VQ5 hard copy follows
1064317 2018-03-27 12:32:16.000  16210  VQ5 hard copy follows
1064317 2018-04-18 07:00:38.000  16210  VQ5 hard copy follows
1064317 2018-04-19 07:00:39.000  16210  VQ5 hard copy follows
1064351 2018-03-21 16:21:47.000  16210  VQ5 hard copy follows
1064351 2018-03-27 12:31:51.000  16210  VQ5 hard copy follows
1064351 2018-04-18 07:01:50.000  16210  VQ5 hard copy follows
1064351 2018-04-19 07:02:03.000  16210  VQ5 hard copy follows

I would get:

1046338 2018-04-24 07:01:57.000  16210  VQ5 hard copy follows
1064317 2018-03-21 16:21:52.000  16210  VQ5 hard copy follows
1064351 2018-03-21 16:21:47.000  16210  VQ5 hard copy follows

Any help would be greatly appreciated.

edit: I have since found the solution which is as below:

Update t set te_system_ref = ('-' + Convert(nvarchar(50),te_system_ref))
from (
select row_number() over (partition by te_system_ref order by (select 0)) as rn, te_system_ref, te_event from ticket_events where te_system_ref in 
(
select te_system_ref from ticket_events where te_event = 'VQ5 hard copy follows' and te_date between '04/12/17' and '18/06/18 23:59:59' group by te_system_ref having count (te_event) > 10
)
and te_event =  'VQ5 hard copy follows'
)t
where t.rn <> 1 and te_system_ref not like '-%'

Add some sample table data and the expected result - all as formatted text, not images. — jarlh, Jun 18 '18 at 12:36
`row_number over (partition by te_system_ref) AS rn` and then select rn=1 (or delete where rn <> 1 ) — joop, Jun 18 '18 at 12:38
For duplicates identification why do you have `and te_date between '04/12/17' and '18/06/18 23:59:59' ` ? Why is this missing in outer select? — DhruvJoshi, Jun 18 '18 at 12:40
https://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows — WhatsThePoint, Jun 18 '18 at 12:41
@joop i am trying to implement this, could you point me to where i'm going wrong? — james mcrow, Jun 18 '18 at 14:11
Update ticket_events set te_system_ref = ('-' + ticket_events.te_system_ref) from ( select row_number() over (partition by te_system_ref order by (select 0)) as rn, te_system_ref, te_event from ticket_events where te_system_ref in ( select te_system_ref from ticket_events where te_event = 'VQ5 hard copy follows' and te_date between '04/12/17' and '18/06/18 23:59:59' group by te_system_ref having count (te_event) > 3 ) and te_event = 'VQ5 hard copy follows' ) where rn <> 1 — james mcrow, Jun 18 '18 at 14:12
Please **do not** put code samples or sample data into comments - since you cannot format it, it's **extremely hard** to read it.... Instead: **update** your question by editing it to provide that additional information! Thank you. — marc_s, Jun 18 '18 at 14:31
I really suggest you add one... even if it's just a pretend one like an `IDENTITY`. It's bad idea to not have a unique identifier, particularly if you are in the habit of getting duplicate rows — Nick.Mc, Jun 26 '18 at 10:47

score 0 · Answer 1 · answered Jun 18 '18 at 17:52

0

You cannot.

...but contains no unique identifier...

Well, in SQL all rows with the same values are equivalent. You cannot distinguish two rows that look the same, and therefore you cannot update just one of them.

You would need to add some kind of unique key to identify each row. Maybe an extra column populated using a sequence, a timestamp, or something. If you have that in place, then you can start differentiating rows, and your life will become much easier.

answered Jun 18 '18 at 17:52

The Impaler

45,731
9
39
76

thanks for the reply but I have since figured this out. In the end I used row_number to create the unique identifier needed. – james mcrow Jun 26 '18 at 10:40

how to delete duplicate rows from query without any unique identifier

1 Answers1