0

I have the following SQL table called readings.

date        |  today  | yesterday | tomorrow | creationtime               | source
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x1
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x2
2021-01-01      150       0.9         1        2021-01-01 12:55:17....       x3
....
2021-02-15      110       0.3         1        2021-02-15 12:42:17....       x1
2021-02-15      110       0.1         1        2021-02-15 12:42:17....       x2
2021-02-15      150       0.9         1        2021-02-15 12:55:17....       x3
...
2021-02-15      110       0.5         0        2021-02-16 16:06:04.008673    x17
2021-02-15      110       0.5         0        2021-02-15 15:59:46.383677    x17
....
2021-02-15      700       0.7         1        2021-02-16 16:04:02.267478    x20
2021-02-15      110       0.7         1        2021-02-15 15:59:48.060236    x20
....
2021-02-22      110       0.5         1        2021-02-15 16:01:16.826577    x55
2021-02-22      110       0.5         1        2021-02-16 16:09:17.524436    x55

There are 65 readings every day. Readings from x1, x2, x3... until x65.

So I found duplicate readings on certain days.

Sometimes the readings are different, so I want to keep the newer reading on that day, even though it was only recorded the following day.

I want to drop the duplicated values, I want to keep the newer creation time. So I want my table to end up looking like this.

date        |  today  | yesterday | tomorrow | creationtime               | source
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x1
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x2
2021-01-01      150       0.9         1        2021-01-01 12:55:17....       x3
....
2021-02-15      110       0.3         1        2021-02-15 12:42:17....       x1
2021-02-15      110       0.1         1        2021-02-15 12:42:17....       x2
2021-02-15      150       0.9         1        2021-02-15 12:55:17....       x3
...
2021-02-15      110       0.5         0        2021-02-16 16:06:04.008673    x17
....
2021-02-15      700       0.7         1        2021-02-16 16:04:02.267478    x20
....
2021-02-22      110       0.5         1        2021-02-16 16:09:17.524436    x55

I tried to do

create table new_readings as select distinct c.* from readings c;

But it just creates a copy of the table and drops values which are completely distinct.

anarchy
  • 3,709
  • 2
  • 16
  • 48

3 Answers3

2

It seems to be simply

select distinct on ("date", source) *
from readings
order by "date", source, creationtime desc;

which reads "pick only one (the latest) reading per source per day".

Stefanov.sm
  • 11,215
  • 2
  • 21
  • 21
  • OMG IT WORKED. why did you do "date" ? ive never seen the double quote syntax – anarchy Jun 01 '21 at 17:45
  • Because `date` is a reserved word. – Stefanov.sm Jun 01 '21 at 17:50
  • what do you mean? so when i use "date" its the date from my table and not the date which is the reserved word? – anarchy Jun 01 '21 at 17:51
  • Well, yes, you can use non-conformant names if you enclose them in double quotes. Quoting 'date' helps not confuse the column name with the data type with the same name. However IMHO it's better to not use reserved words or non-conformant names at all. – Stefanov.sm Jun 01 '21 at 17:58
  • Also what’s the difference between distinct and distinct on? Why do you also use an asterisk outside the bracket – anarchy Jun 01 '21 at 18:07
  • There is a good SO answer to this question [here](https://stackoverflow.com/questions/50846722/what-is-the-difference-between-postgres-distinct-vs-distinct-on). – Stefanov.sm Jun 01 '21 at 18:13
0

You can use distinct on:

select distinct on (date, today, yesterday, tomorrow ) r.*
from readings r
order by date, today, yesterday, tomorrow, creationtime desc;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The code below delete all duplicated "source" rows, by "creationtime"

delete from readings r1
    where exists(
        select * from readings r2
        where r1.creationtime > r2.creationtime
        and r1.source = r2.source
    )
order by r1.creationtime;
I.Brayche
  • 17
  • 2