3

I'm trying to find an answer about what is the most efficient way to get first element and last element in given time interval. I have a table interval_data (includes like iot data) which has relation with device table. I want to get a result for each device's first and last element.

Interval Data Table:

    id           device_id          created_at           value
    15269665      1000206      2018-07-21 00:10:00    5099.550000
    15270533      1000206      2018-07-21 00:20:00    5099.610000
    15271400      1000206      2018-07-21 00:30:00    5099.760000
    15272269      1000206      2018-07-21 00:40:00    5099.850000
    15273132      1000206      2018-07-21 00:50:00    5099.910000
    15274040      1000206      2018-07-21 01:00:00    5099.970000
    15274909      1000206      2018-07-21 01:10:00    5100.030000
    15275761      1000206      2018-07-21 01:20:00    5100.110000
    15276629      1000206      2018-07-21 01:30:00    5100.160000
    15277527      1000206      2018-07-21 01:40:00    5100.340000
    15278351      1000206      2018-07-21 01:50:00    5100.400000
    15279219      1000206      2018-07-21 02:00:00    5100.450000
    15280085      1000206      2018-07-21 02:10:00    5100.530000
    15280954      1000206      2018-07-21 02:20:00    5100.590000
    15281858      1000206      2018-07-21 02:30:00    5100.640000
    15282724      1000206      2018-07-21 02:40:00    5100.750000
    15283627      1000206      2018-07-21 02:50:00    5100.870000
    15284495      1000206      2018-07-21 03:00:00    5100.930000
      ...           ...                ...                ...

I tried some queries like:

select created_at, value from interval_data i inner join
(select min(created_at) minin, max(created_at) maxin, d.device_id from device 
d
inner join interval_data i on i.device_id = d.device_id
where d.device_id in (1000022, 1000023, 1000024)
and i.created_at between '2018-01-01 00:00:00' and '2019-01-01 00:00:00' 
group by d.device_id) s
on s.device_id = i.device_id and (s.minin = i.created_at or s.maxin = 
i.created_at)

But when device number increases, response time takes a long time. Do you have any suggestion? How can I find each devices' first and last element faster?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Furkan Uyar
  • 376
  • 4
  • 5
  • Your Postgres version and table definition (`CREATE TABLE` statements showing data types and constraints) are instructive for any question involving SQL. For performance questions in particular. And any additional indexes you might have. And whether you can change table and indexes. Also: "first" and "last" by `id` or by `created_at`? And do you want devices without *any* interval data included in the result? – Erwin Brandstetter Jul 21 '18 at 14:01
  • Also you mention `result for each device`, but your query attempt is for a small selection of devices: `device_id in (1000022, 1000023, 1000024)`. *Can* make a big difference. – Erwin Brandstetter Jul 21 '18 at 14:30
  • Thanks Erwin, first and last meanings by 'created_at'. I need just value and created_at from interval_data and device_id from device. I wrote `device_id in (1000022, 1000023, 1000024)` part as an example, can be two device and sometimes can be eigthy devices. But your solution with lateral and limit 1 logic, it works really good. – Furkan Uyar Jul 23 '18 at 06:36
  • So is this question answered? – Erwin Brandstetter Dec 20 '22 at 22:53

2 Answers2

1

The most efficient query depends on details of your setup. Typically, a query with two LATERAL subqueries should be fastest:

SELECT *  -- or just the columns you need
FROM device d
LEFT JOIN LATERAL (
   SELECT id AS first_intv_id, created_at AS first_created_at, value AS first_value
   FROM   interval_data
   WHERE  device_id = d.id
   ORDER  BY created_at
   LIMIT  1
   ) f ON true
LEFT JOIN LATERAL (
   SELECT id AS last_intv_id, created_at AS last_created_at, value AS last_value
   FROM   interval_data
   WHERE  device_id = d.id
   ORDER  BY created_at DESC  -- NULLS LAST if column isn't NOT NULL
   LIMIT  1
   ) l ON true;

fiddle

Postgres can translate this to a query plan with only fast index scans on the big table interval_data.

About LATERAL:

Be sure to have an index on interval_data(device_id, created_at). If you only need a limited set of columns in the result, it may pay to append more columns to this index to get index-only scans out of it.

LEFT JOIN ... ON true keeps devices with no interval data in the result.

To restrict to a given set of device IDs, append to the query:

...
WHERE  d.id IN (1000022, 1000023, 1000024);

And have an index on device(id) - which would be the typical case anyway.

Assuming a current Postgres version and a setup like this:

CREATE TABLE device (
   id     serial PRIMARY KEY
 , device text NOT NULL
);

CREATE TABLE interval_data (
   id         serial PRIMARY KEY
 , device_id  int NOT NULL
 , created_at timestamp NOT NULL
 , value      numeric NOT NULL
 , CONSTRAINT device_fkey FOREIGN KEY (device_id) REFERENCES device (id)
);

If some involved columns are not defined NOT NULL, you may have to adjust details.

The FK constraint is optional for this solution.

Detailed explanation and discussion of alternatives:

Alternative for a small set of given device IDs

Using window functions with custom window frames can do without a separate table device and may be faster for a small set of IDs:

SELECT DISTINCT ON (device_id)
       device_id
     , first_value(created_at) OVER w AS first_created_at
     , first_value(value)      OVER w AS first_value
     , last_value (created_at) OVER w AS last_created_at
     , last_value (value)      OVER w AS last_value
FROM   interval_data
WHERE  device_id IN (1000022, 1000023, 1000024)
WINDOW w AS (PARTITION BY device_id ORDER BY created_at
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);

fiddle

Same as in the first query above:

  • No result for passed device IDs that do not exist.

But different from the first query above:

  • No result for passed device IDs that do exist, but without any interval data.

About window frames:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

You can use row_number to assign an increasing number to each row with the same device_id. If you do this twice, once in ascending and once in descending order, you can grab the first and the last row per group:

select  device_id
,       created_at
,       value
from    (
        select  row_number() over (partition by device_id order by created_at) rn1
        ,       row_number() over (partition by device_id order by created_at desc) rn2
        ,       *
        from    interval_data
        ) i
where   device_id in (1, 3, 4)
        and (rn1 = 1 or rn2 = 1) -- First or last row per device
        and created_at between '2018-01-01 00:00:00' and '2019-01-01 00:00:00' 

Example at SQL Fiddle.

Andomar
  • 232,371
  • 49
  • 380
  • 404