I have a simple database table
create table demo (
id integer PRIMARY KEY,
fv integer,
sv text,
rel_id integer,
FOREIGN KEY (rel_id)
REFERENCES demo(id));
and i want to delete all duplicate rows grouped by fv
and sv
. Which is already a fairly popular question with great answers.
But I need a twist on that scenario. As in cases where rel_id
is NULL
I want to keep that row. In any other case anything goes.
So by using the following values
insert into demo (id,fv,sv,rel_id)
VALUES (1,1,'somestring',NULL),
(2,2,'somemorestring',1),
(3,1,'anotherstring',NULL),
(4,2,'somemorestring',3),
(5,1,'somestring',3)
Either
id | fv | sv | rel_id
---+----+------------------+-------
1 | 1 | 'somestring' | NULL
2 | 2 | 'somemorestring' | 1
3 | 1 | 'anotherstring' | NULL
or
id | fv | sv | rel_id
---+----+------------------+-------
1 | 1 | 'somestring' | NULL
3 | 1 | 'anotherstring' | NULL
4 | 2 | 'somemorestring' | 3
would be valid results. Where as
id | fv | sv | rel_id
---+----+------------------+-------
3 | 1 | 'anotherstring' | NULL
4 | 2 | 'somemorestring' | 3
5 | 1 | 'somestring' | 3
would not be. As the first entry had NULL
as rel_id
which takes presidency above NOT NULL
.
I currently have this (which is an answer on the basic duplicate question) as a query to remove duplicates but I am not sure how to continue to modify the query to fit my needs.
DELETE FROM demo
WHERE id NOT IN (SELECT min(id) as id
FROM demo
GROUP BY fv,sv)
As as soon as the NOT NULL
entry is inserted into the database before the NULL
entry the NOT NULL
one will not be deleted. It is guaranteed that rel_id
will always point to an entry where rel_id
is NULL
therefore there is no danger of deleting a referenced entry. Further it is guaranteed that there will be no two rows in the same group with rel_id IS NULL
. Therefore a row with rel_id IS NULL
is unique for the whole table.
Or as a basic algorithm:
- Go over all rows and group them by
fv
andsv
- Look into each group for a row where
rel_id IS NULL
. If there is keep that row (and delete the rest). Else pick one row of your choice and delete the rest.