Composite JSONB array query in Postgres?

Question

Table: test, JSONB column: content:

create table test (id bigserial primary key, content jsonb);

content contains a list of fixed-length lists:

insert into test values (1, '[["first 1", "second 3"]]');
insert into test values (2, '[["first 1", "second 2"], ["first 2", "second 3"]]');
insert into test values (3, '[["first 1", "second 2"], ["first 1", "second 3"]]');
insert into test values (4, '[["first 2", "second 3"], ["first 1", "second 2"], ["first 1", "second 2"]]');

What's the correct Postgres syntax for a query that returns all rows where at least one of the content elements satisfies (first element = "first 1") AND (second element ILIKE "%3%")?

That is, in the example above, it should select rows 1 and 3, but not 2 or 4.

Bonus question: what is the most efficient way to do such query (in case there are multiple alternatives)? Does it make sense to look into GIN over JSONB with pg_trgm? (There are millions of rows, the inner string values are typically 10-100 characters long, and each content list contains 0-1000s of lists (most usually 0).)

Thanks!

Jeremy · Answer 1 · 2019-06-06T19:55:50.453

3

You should split apart the top level arrays and check the elements from there:

select distinct id, content
FROM test
JOIN lateral (
    select elems 
    FROM jsonb_array_elements(content) jae(elems)
) all_arrays ON TRUE
WHERE elems ->> 0 = 'first 1'
and elems ->> 1 ilike '%3%'
ORDER BY 1;

As for the best way to do this, that depends a lot on your actual data - how many rows, how big these jsonb structures are, etc. In general, though, a search like ilike '%3%' will benefit from indexes based off of pg_trgm because they can't use traditional btree indexes.

Edit: @Abelisto's query in the comments is better because it should be more performant, especially if content can contain 1000s of elements:

select * from test 
where exists 
  (select 1 
   from jsonb_array_elements(content) jae(elems) 
   where elems ->> 0 = 'first 1' 
   and elems ->> 1 ilike '%3%'
  );

edited Jun 06 '19 at 19:55

answered Jun 06 '19 at 19:26

Jeremy

6,313
17
20

2

Using `exists` could be shorter and more convenient: `select * from test where exists (select 1 from jsonb_array_elements(content) jae(elems) where elems ->> 0 = 'first 1' and elems ->> 1 ilike '%3%')` – Abelisto Jun 06 '19 at 19:35
Thanks Abelisto, I like that as well. – Jeremy Jun 06 '19 at 19:39
@Jeremy I updated the question with expected data sizes. Can you include an example of creating the right index for the second list element (the one queried with `ILIKE`)? – user124114 Jun 06 '19 at 19:41
@Abelisto "shorter" as in faster? Convenience is not a concern here. Correctness and performance is. – user124114 Jun 06 '19 at 19:45
1

@user124114 It could be faster because it removes two operations from the query: `join` and `distinct`. But to get really good performance you need to normalize your data structure. – Abelisto Jun 06 '19 at 19:50
@Abelisto normalize in what sense? – user124114 Jun 06 '19 at 20:07
1

@user124114 He means use single values per row instead of using jsonb. If your data were in columns of (id, first_val, second_val) you could index first_val with a btree index and pretty easily add the trgm index to second_val. – Jeremy Jun 06 '19 at 20:35

score 1 · Accepted Answer · answered Jun 06 '19 at 19:35

Inner select expands array elements into separate rows with jsonb_array_elements, outer select does the filtering you want. See SQL Fiddle for live example.

select * from (
select id, jsonb_array_elements(content) as item from test  
) as expandedtest
where item->>0 like 'first 1' and item->>1 like '%3%'

Composite JSONB array query in Postgres?

2 Answers2

Linked