My challenge is to find pairs of rows that are adjacent by timestamp and keep only those pairs with minimal distance of a value field (positive values of the difference)
A table measurement
collects data from different sensors with a timestamp and a value.
id | sensor_id | timestamp | value
---+-----------+-----------+------
1 | 1 | 12:00:00 | 5
2 | 2 | 12:01:00 | 6
3 | 1 | 12:02:00 | 4
4 | 2 | 12:02:00 | 7
5 | 2 | 12:03:00 | 3
6 | 1 | 12:05:00 | 3
7 | 2 | 12:06:00 | 4
8 | 2 | 12:07:00 | 5
9 | 1 | 12:08:00 | 6
A sensor's value is valid from its timestamp until the timestamp of its next record (same sensor_id).
Graphical representation
The lower green line shows the distance of sensor 1's (blue line) and sensor 2's (red line) values over time.
My aim is
- to combine only those records of 2 sensors that match the timestamp logic (to get the green line)
- to find the dinstance local minimums at
- 12:01:00 (at 12:00:00 there's no record for sensor 2)
- 12:05:00
- 12:08:00
The real table resides in a PostgreSQL database and contains about 5 million records of 15 sensors.
Test data
create table measurement (
id serial,
sensor_id integer,
timestamp timestamp,
value integer)
;
insert into measurement (sensor_id, timestamp, value)
values
(1, '2020-08-16 12:00:00', 5),
(2, '2020-08-16 12:01:00', 6),
(1, '2020-08-16 12:02:00', 4),
(2, '2020-08-16 12:02:00', 7),
(2, '2020-08-16 12:03:00', 3),
(1, '2020-08-16 12:05:00', 3),
(2, '2020-08-16 12:06:00', 4),
(2, '2020-08-16 12:07:00', 5),
(1, '2020-08-16 12:08:00', 6)
;
My approach
was to pick 2 arbitrary sensors (by certain sensor_ids), make a self join and retain for any sensor 1's record only that record of the sensor 2 with the previous timestamp (biggest timestamps of sensor 2 with sensor 1's timestamp <= sensor 2's timestamp).
select
*
from (
select
*,
row_number() over (partition by m1.timestamp order by m2.timestamp desc) rownum
from measurement m1
join measurement m2
on m1.sensor_id <> m2.sensor_id
and m1.timestamp >= m2.timestamp
--arbitrarily sensor_ids 1 and 2
where m1.sensor_id = 1
and m2.sensor_id = 2
) foo
where rownum = 1
union --vice versa
select
*
from (
select
*,
row_number() over (partition by m2.timestamp order by m1.timestamp desc) rownum
from measurement m1
join measurement m2
on m1.sensor_id <> m2.sensor_id
and m1.timestamp <= m2.timestamp
--arbitrarily sensor_ids 1 and 2
where m1.sensor_id = 1
and m2.sensor_id = 2
) foo
where rownum = 1
;
But that returns a pair with 12:00:00
where sensor 2 has no data (not a big problem)
and on the real table the statement execution does not end after hours (big problem).
I found certain similar questions but they don't match my problem
Thanks in advance!