1

I have a problem with the executing speed of my sql query to postgres database.

I have 2 tables:

table 1: DEVICES

ID | NAME
------------------
1  | first device
2  | second device

table 2: DATA

ID | DEVICE_ID | TIME                | DATA
--------------------------------------------
1  | 1         | 2016-07-14 2:00:00  | data1
2  | 1         | 2016-07-14 1:00:00  | data2
3  | 2         | 2016-07-14 4:00:00  | data3
4  | 1         | 2016-07-14 3:00:00  | data4
5  | 2         | 2016-07-14 6:00:00  | data5
6  | 2         | 2016-07-14 5:00:00  | data6

I need get this select's result table:

ID | DEVICE_ID | TIME               | DATA
-------------------------------------------
4  | 1         | 2016-07-14 3:00:00 | data4
5  | 2         | 2016-07-14 6:00:00 | data5

i.e. for each device in devices table I need to get only one data record with the last TIME value.

This is my sql query:

SELECT * FROM db.data d 
    WHERE d.time = (
        SELECT MAX(d2.time) FROM db.data d2 
             WHERE d2.device_id = d.device_id);

This is HQL query equivalent:

SELECT d FROM Data d 
    WHERE d.time = (
        SELECT MAX(d2.time) FROM Data d2 
            WHERE d2.device.id = t2.device.id)

Yes, I use Hibernate ORM in my project - may this info will be useful for someone.

I got correct answer on my queries, BUT it's too long - about 5-10 seconds on 10k records in data table and only 2 devices in devices table. It's terrible.

First of all, I thought that problem is in Hibernate. But native sql query from psql in linux terminal execute the same time as through hibernate.

How can I optimize my query? This query is too complexity:

 O(device_count * data_count^2)
Binakot
  • 298
  • 5
  • 15
  • 1
    Look at the second answer to the question I marked as duplicate, it has a Postgresql specific solution (`distinct on`) that could be the best choice. – jpw Jul 21 '16 at 10:03
  • 1
    Yea, u're right. I solve this: SELECT DISTINCT ON (device_id) * from db.data order by device_id, time DESC; – Binakot Jul 21 '16 at 10:23

1 Answers1

2

Since you're using Postgres, you could use window functions to achieve this, like so:

select
    sq.id,
    sq.device_id,
    sq.time,
    sq.data
from (
    select
        data.*,
        row_number() over (partition by data.device_id order by data.time desc) as rnk
    from
        data
) sq
where
    sq.rnk = 1

The row_number() window function first ranks the rows in the data table on the basis of the device_id and time columns, and the outer query then picks the highest-ranked rows.

Chitharanjan Das
  • 1,283
  • 10
  • 15
  • Thanks. It works. In ~100500 times faster. Awesome. But there is some not-postgres solution, that I can realize by HQL in named query? – Binakot Jul 21 '16 at 10:16
  • @Binakot Personally, I don't have any experience writing HQL, but from what I've read online, it doesn't seem to support window functions. – Chitharanjan Das Jul 21 '16 at 13:26
  • Yes, it doesn't support window functions or distinct on. Postgres is very powerful. I used native query to solve that. Thanks. – Binakot Jul 21 '16 at 13:48
  • @Binakot awesome. don't forget to mark the best answer as "accepted" :) – Chitharanjan Das Jul 21 '16 at 13:51