7

I have a json array with around 1000 elements of the structure "oid: aaa, instance:bbb, value:ccc".

{"_id": 37637070
, "data": [{"oid": "11.5.15.1.4", "value": "1", "instance": "1.1.4"}
         , {"oid": "11.5.15.1.9", "value": "17", "instance": "1.1.4"}
         , {"oid": "12.5.15.1.5", "value": "0.0.0.0", "instance": "0"}]}

oid and instance are unique per json array. If I was given the option to change the structure I would have changed the format to key:value:

{"11.5.15.1.4-1.1.4":"1", "11.5.15.1.9-1.1.4": "17", "12.5.15.1.5-0": "0.0.0.0"}

However, if I need to stay with the old structure

  1. What is the fastest way to get specific oid from the array?

  2. What is the fastest way to get a table with 3 columns of oid, instance and value. Or even better a pivot table with oid+instance as the column header.

For 2. I tried the following, but it was pretty slow on a large table:

select *
from (
   select a->>'oid' oid, a->>'instance' instance, a->>'value' value1, id
   from (
      select jsonb_array_elements(config#>'{data}')  a, id
      from configuration
      ) b
   ) c
where  oid = '1.3.6.1.4.1.7352.3.10.2.5.35.3' and instance = '0' and value1 <> '1';
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user1409708
  • 953
  • 2
  • 11
  • 20
  • Your query can be made a lot simpler looking as `select a->>'oid' oid, a->>'instance' instance, a->>'value' value1, id from ( select jsonb_array_elements(config#>'{data}') a, id from configuration ) b where a->>'instance' = '1.1.4' and a->>'oid' = '11.5.15.1.4' and a->>'value'<>'2';` but PostgreSQL optimizes both your and this query into the same thing. – Sami Kuhmonen Jun 14 '15 at 09:19
  • Though it might make it faster if in the innermost query you also restricted the resulting rows with `config@>'{"data":[{"oid":"11.5.15.1.4","instance":"0"}]}'`, then there wouldn't need to be as many rows from `jsonb_array_elements` to check. – Sami Kuhmonen Jun 14 '15 at 09:39
  • The definition of your table should be in the question. What you get with `\d configuration` in psql. Or a valid `CREATE TABLE` script. – Erwin Brandstetter Jun 14 '15 at 12:24

1 Answers1

9

Query

Your table definition is missing. Assuming:

CREATE TABLE configuration (
  config_id serial PRIMARY KEY
, config jsonb NOT NULL
);

To find the a value and its row for given oid and instance:

SELECT c.config_id, d->>'value' AS value
FROM   configuration c
     , jsonb_array_elements(config->'data') d  -- default col name is "value"
WHERE  d->>'oid'      = '1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND    d->>'instance' = '0'
AND    d->>'value'   <> '1';

That's an implicit LATERAL join. Compare:

  1. What is the fastest way to get a table with 3 columns of oid, instance and value.

I suppose to use jsonb_populate_recordset(), then you can provide data types in the table definition. Assuming text for all:

CREATE TEMP TABLE data_pattern (oid text, value text, instance text);

Could also be a persisted (non-temp) table. This one is only for the current session. Then:

SELECT c.config_id, d.*
FROM   configuration c
     , jsonb_populate_recordset(NULL::data_pattern, c.config->'data') d;

That's all. The first query rewritten:

SELECT c.config_id, d.*
FROM   configuration c
     , jsonb_populate_recordset(NULL::data_pattern, c.config->'data') d
WHERE  d.oid      = '1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND    d.instance = '0'
AND    d.value   <> '1';

But that's slower than the first query. Key to performance with bigger table is index support:

Index

You could easily index the normalized (translated) table or the alternative layout you proposed in the question. Indexing your current layout is not as obvious, but also possible. For best performance I suggest a functional index on just the data key with the jsonb_path_ops operator class. Per documentation:

The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.

This should work wonders for performance:

CREATE INDEX configuration_my_idx ON configuration
USING gin ((config->'data') jsonb_path_ops);

One might expect that only a complete match for a JSON array element would work, like:

SELECT * FROM configuration
WHERE  (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3"
                            , "instance": "0", "value": "1234"}]';

Note the JSON array notation (with enclosing []) of the provided value: that's required.

But array elements with a subset of keys work as well:

SELECT * FROM configuration
WHERE  (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3"
                            , "instance": "0"}]'

The hard part is to incorporate your added predicate value <> '1'. Care must be taken to apply all predicates to the same array element. You could combine this with the first query:

SELECT c.*, d->>'value' AS value
FROM   configuration c
     , jsonb_array_elements(config->'data') d
WHERE  (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3", "instance": "0"}]'
AND    d->>'oid'      = '1.3.6.1.4.1.7352.3.10.2.5.35.3'  -- must be repeated
AND    d->>'instance' = '0'                               -- must be repeated
AND    d->>'value'   <> '1'                               -- here we can rule out

Voilá.

Special index

If your table is huge, index size may be a deciding factor. You could compare performance of this special solution with a functional index:

This function extracts a Postgres array of oid-instance combinations from a given jsonb value:

CREATE OR REPLACE FUNCTION f_config_json2arr(_j jsonb)
  RETURNS text[]
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT ARRAY(
   SELECT (elem->>'oid') || '-' || (elem->>'instance')
   FROM   jsonb_array_elements(_j) elem
   )
$func$;

We can build a functional index based on this:

CREATE INDEX configuration_conrfig_special_idx ON configuration
USING  gin (f_config_json2arr(config->'data'));

And base the query on it:

SELECT * FROM configuration
WHERE  f_config_json2arr(config->'data') @> '{1.3.6.1.4.1.7352.3.10.2.5.35.3-0}'::text[]

The idea is that the index should be substantially smaller because it only stores the combined values without keys. The array containment operator @> itself should perform similar to the jsonb containment operator @>. I don't expect a big difference, but I would be very interested which is faster.

Similar to the first solution in this related answer (but more specialized):

Asides

  • I would not use oid as column name since that's also used for internal purposes in Postgres.
  • If possible, I would use a plain, normalized table without JSON.
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for an excellent example of the `jsonb_populate_recordset` usage. I am curious of performance benefits. Maybe OP will share the results of his tests. – klin Jun 14 '15 at 13:26
  • For some reason the query suggested takes around 20 seconds while the following takes 5 seconds - SELECT obj->>'oid', obj->>'value' FROM configuration r, jsonb_array_elements(r.config->'data') obj WHERE obj->>'oid' = '3.10.2.5.35.3' – user1409708 Jun 14 '15 at 14:13
  • Thank you very much! Once I added the index the performance is very good. From several seconds it went down to less than one second. Can I have in the condition (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3", "instance": "0"}]' several elements with OR between them? – user1409708 Jun 18 '15 at 08:42
  • @user1409708: I suggest you start a new question. You can always link to this one for context. BTW, did you just try the `jsonb_path_ops` index or the functional index as well? – Erwin Brandstetter Jun 18 '15 at 12:44