I have searched numerous answers in here stackoverflow but I can't seem to understand how to delete duplicate rows. In my case, delete rows with the same pd
in a td
.
I have a table here with a name questions
.
Let pd
be problem_id, td
be task_id (just to make the naming short)
id score pd td
'1', '10', '673', '2663'
'2', '10', '674', '2663'
'3', '10', '675', '2663'
'4', '10', '676', '2663'
'5', '10', '677', '2663'
'6', '10', '678', '2663'
'7', '10', '674', '2663'
'8', '10', '675', '2663'
'9', '10', '676', '2663'
'10', '10', '677', '2663'
'11', '10', '710', '2700'
'12', '10', '711', '2700'
'13', '10', '710', '2700'
'14', '10', '711', '2700'
How to do an sql script that will result to this:
id score pd td
'1', '10', '673', '2663'
'2', '10', '674', '2663'
'3', '10', '675', '2663'
'4', '10', '676', '2663'
'5', '10', '677', '2663'
'6', '10', '678', '2663'
'11', '10', '710', '2700'
'12', '10', '711', '2700'
I'm still not an expert in SQL and I have this code that I based from this answer https://stackoverflow.com/a/2594879/7362231 but it seems that it is giving me an error.
DELETE d
FROM staging_db.questions as d
INNER JOIN (SELECT
y.id, y.problem_id, y.task_id, ROW_NUMBER() OVER(PARTITION BY y.problem_id,y.task_id ORDER BY y.problem_id,y.task_id,y.id) AS RowRank
FROM staging_db.questions as y
INNER JOIN (SELECT
problem_id,task_id, COUNT(*) AS CountOf
FROM staging_db.questions
GROUP BY problem_id,task_id
HAVING COUNT(*)>1
) as dt ON y.problem_id=dt.problem_id AND y.task_id=dt.task_id
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM staging_db.questions;