0

I'm looking forward to optimize a query i have. The tablese are:

static_data: historical configuration data of items

id(serial pk) timestamp(default now() indexed) tag(varchar 255 fk from tag indexed) prop(varchar 32) value(text)
1 2022-03-11 03:45:01 AA-12 max_value 100
2 2022-03-11 03:45:01 AA-12 alarm_high 90
3 2022-03-11 03:45:01 AA-12 alarm_low 5
4 2022-03-11 03:45:01 AA-12 min_value -30
5 2022-03-11 03:45:01 AA-12 card_type 200HL-A
6 2022-03-11 03:45:01 BB-14 max_value 5.5
7 2022-03-11 03:45:01 BB-14 min_value 1.1
8 2023-01-12 12:22:56 AA-12 alarm_low <null>
9 2023-01-12 12:22:56 AA-12 card_type 300H-BS
10 2023-01-12 12:22:56 AA-12 alarm_high 80

tag: tag list (there are other columns that are not related to this query)

tag(varchar 255 pk)
AA-12
BB-14

The purpose of static_data is to provide historical data about items configuration. In the example above the item AA-12 was changed on 2022-03-11 03:45:01, max_value, min_value, alarm_high and alarm_low were set. The next year it was changed again, alarm_high was changed and alarm_low was removed because the sensor hw was updated and the new hw does not have alarm_low.

To make use of this data i need a query to retrieve the configuration of all items at any time.

Currently I'm using this query:

 SELECT sd.tag,
    jsonb_object_agg(sd.prop, sd.value) AS data
   FROM ( SELECT DISTINCT ON (static_data.tag, static_data.prop) static_data.id,
            static_data."timestamp",
            static_data.tag,
            static_data.prop,
            static_data.value
           FROM hierarchy.static_data
          WHERE static_data."timestamp" < now()
          ORDER BY static_data.tag, static_data.prop, static_data."timestamp" DESC) sd
  WHERE sd.value IS NOT NULL
  GROUP BY sd.tag

Which results in:

tag data
AA-12 {"max_value":"100", "min_value":"-30", "alarm_high": "80", "card_type":"300H-BS"}
BB-14 {"max_value":"5.5", "min_value":"1.1"}

The result is as exactly what i need, for every tag a jsonb which has for each property the most up-to-date value before <timestamp> without the nulls, the problem being the query is too slow, it takes 200-220ms to run on my test data-sample 1224tags with 6 props each.

Tag count in production should be 6k but in future projects can be as high as 10-15k, each item will have about 30-40 props depending on item type, so the static_data table scales pretty fast with time.

I cannot save the entire jsonb with the tag config every time it is changed because i need the prop that is changed or removed in reports about config change and in dashboard for every tag.prop to display the historical value.

This issue is similar to mine and accepted answer in section 2 addresses my issue partially, but my result should be grouped by tag and prop first, then by tag to aggregate the config in a single jsonb.

EDIT:

EXPLAIN(ANALYZE, VERBOSE, BUFFERS):

GroupAggregate  (cost=1013.39..1116.72 rows=200 width=38) (actual time=14.228..20.671 rows=1224 loops=1)
  Output: sd.tag, jsonb_object_agg(sd.prop, sd.value)
  Group Key: sd.tag
  Buffers: shared hit=110
  ->  Subquery Scan on sd  (cost=1013.39..1108.13 rows=1218 width=28) (actual time=14.217..16.450 rows=6811 loops=1)
        Output: sd.id, sd."timestamp", sd.tag, sd.prop, sd.value
        Filter: (sd.value IS NOT NULL)
        Rows Removed by Filter: 1
        Buffers: shared hit=110
        ->  Unique  (cost=1013.39..1095.89 rows=1224 width=44) (actual time=14.214..15.867 rows=6812 loops=1)
              Output: NULL::bigint, static_data."timestamp", static_data.tag, static_data.prop, static_data.value
              Buffers: shared hit=110
              ->  Sort  (cost=1013.39..1040.89 rows=11000 width=44) (actual time=14.214..14.609 rows=11000 loops=1)
                    Output: NULL::bigint, static_data."timestamp", static_data.tag, static_data.prop, static_data.value
                    Sort Key: static_data.tag, static_data.prop, static_data."timestamp" DESC
                    Sort Method: quicksort  Memory: 1244kB
                    Buffers: shared hit=110
                    ->  Seq Scan on hierarchy.static_data  (cost=0.00..275.00 rows=11000 width=44) (actual time=0.007..1.827 rows=11000 loops=1)
                          Output: NULL::bigint, static_data."timestamp", static_data.tag, static_data.prop, static_data.value
                          Filter: (static_data."timestamp" < now())
                          Buffers: shared hit=110
Planning:
  Buffers: shared hit=3
Planning Time: 0.138 ms
Execution Time: 20.725 ms

static_data create SQL:

https://pastebin.com/raw/E9u2j6ra

tag create SQL:

https://pastebin.com/raw/kWfTNC5L

  • Could you please share the result (in plain text!) from EXPLAIN(ANALYZE, VERBOSE, BUFFERS) for this statement and all DDL (including the indexes) for all tables involved? Otherwise we don't know where the time is spent. – Frank Heikens Feb 03 '23 at 12:37
  • After running EXPLAIN(ANALYZE, VERBOSE, BUFFERS) now it takes 25-45ms, much better, thank you! But... Why? XD – Fiodar Shurankou Feb 03 '23 at 14:01
  • All data could be found in memory, see the buffers, all from shared hit. On a slow or busy disk your query could become slow again when the data is not in memory yet. – Frank Heikens Feb 03 '23 at 14:07

0 Answers0