SQL (POSTGRESQL) drop duplicate values based on certain columns only, keep newer value based on each duplicate set

Question

I have the following SQL table called readings.

date        |  today  | yesterday | tomorrow | creationtime               | source
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x1
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x2
2021-01-01      150       0.9         1        2021-01-01 12:55:17....       x3
....
2021-02-15      110       0.3         1        2021-02-15 12:42:17....       x1
2021-02-15      110       0.1         1        2021-02-15 12:42:17....       x2
2021-02-15      150       0.9         1        2021-02-15 12:55:17....       x3
...
2021-02-15      110       0.5         0        2021-02-16 16:06:04.008673    x17
2021-02-15      110       0.5         0        2021-02-15 15:59:46.383677    x17
....
2021-02-15      700       0.7         1        2021-02-16 16:04:02.267478    x20
2021-02-15      110       0.7         1        2021-02-15 15:59:48.060236    x20
....
2021-02-22      110       0.5         1        2021-02-15 16:01:16.826577    x55
2021-02-22      110       0.5         1        2021-02-16 16:09:17.524436    x55

There are 65 readings every day. Readings from x1, x2, x3... until x65.

So I found duplicate readings on certain days.

Sometimes the readings are different, so I want to keep the newer reading on that day, even though it was only recorded the following day.

I want to drop the duplicated values, I want to keep the newer creation time. So I want my table to end up looking like this.

date        |  today  | yesterday | tomorrow | creationtime               | source
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x1
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x2
2021-01-01      150       0.9         1        2021-01-01 12:55:17....       x3
....
2021-02-15      110       0.3         1        2021-02-15 12:42:17....       x1
2021-02-15      110       0.1         1        2021-02-15 12:42:17....       x2
2021-02-15      150       0.9         1        2021-02-15 12:55:17....       x3
...
2021-02-15      110       0.5         0        2021-02-16 16:06:04.008673    x17
....
2021-02-15      700       0.7         1        2021-02-16 16:04:02.267478    x20
....
2021-02-22      110       0.5         1        2021-02-16 16:09:17.524436    x55

I tried to do

create table new_readings as select distinct c.* from readings c;

But it just creates a copy of the table and drops values which are completely distinct.

score 2 · Accepted Answer · answered Jun 01 '21 at 17:40

2

It seems to be simply

select distinct on ("date", source) *
from readings
order by "date", source, creationtime desc;

which reads "pick only one (the latest) reading per source per day".

answered Jun 01 '21 at 17:40

Stefanov.sm

11,215
2
21
21

OMG IT WORKED. why did you do "date" ? ive never seen the double quote syntax – anarchy Jun 01 '21 at 17:45
Because `date` is a reserved word. – Stefanov.sm Jun 01 '21 at 17:50
what do you mean? so when i use "date" its the date from my table and not the date which is the reserved word? – anarchy Jun 01 '21 at 17:51
Well, yes, you can use non-conformant names if you enclose them in double quotes. Quoting 'date' helps not confuse the column name with the data type with the same name. However IMHO it's better to not use reserved words or non-conformant names at all. – Stefanov.sm Jun 01 '21 at 17:58
Also what’s the difference between distinct and distinct on? Why do you also use an asterisk outside the bracket – anarchy Jun 01 '21 at 18:07
There is a good SO answer to this question [here](https://stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on). – Stefanov.sm Jun 01 '21 at 18:13

Gordon Linoff · Answer 2 · 2021-06-01T17:55:20.673

0

You can use distinct on:

select distinct on (date, today, yesterday, tomorrow ) r.*
from readings r
order by date, today, yesterday, tomorrow, creationtime desc;

edited Jun 01 '21 at 17:55

answered Jun 01 '21 at 17:09

Gordon Linoff

1,242,037
58
646
786

it should be distinct on date and source, then only select one based on creation time – anarchy Jun 01 '21 at 17:16
@anarchy add * after distinct on (...) – eshirvana Jun 01 '21 at 17:31
will it work though? did you see that on 2021-02-15, there are 2 different readings under today. – anarchy Jun 01 '21 at 17:33
it doesnt work, i lose a lot of data @eshirvana – anarchy Jun 01 '21 at 17:36
i updated my question, i think it wasnt clear enough, there are 65 sources everyday. – anarchy Jun 01 '21 at 17:40

score 0 · Answer 3 · answered Jun 01 '21 at 17:23

0

The code below delete all duplicated "source" rows, by "creationtime"

delete from readings r1
    where exists(
        select * from readings r2
        where r1.creationtime > r2.creationtime
        and r1.source = r2.source
    )
order by r1.creationtime;

answered Jun 01 '21 at 17:23

I.Brayche

17
2

i updated my question, i think it wasnt clear enough, there are 65 sources everyday. – anarchy Jun 01 '21 at 17:39
what is ur SGBD : oracle, postgresql, mysql ... ? – I.Brayche Jun 01 '21 at 17:46

SQL (POSTGRESQL) drop duplicate values based on certain columns only, keep newer value based on each duplicate set

3 Answers3

Linked