1

Table: test, JSONB column: content:

create table test (id bigserial primary key, content jsonb);

content contains a list of fixed-length lists:

insert into test values (1, '[["first 1", "second 3"]]');
insert into test values (2, '[["first 1", "second 2"], ["first 2", "second 3"]]');
insert into test values (3, '[["first 1", "second 2"], ["first 1", "second 3"]]');
insert into test values (4, '[["first 2", "second 3"], ["first 1", "second 2"], ["first 1", "second 2"]]');

What's the correct Postgres syntax for a query that returns all rows where at least one of the content elements satisfies (first element = "first 1") AND (second element ILIKE "%3%")?

That is, in the example above, it should select rows 1 and 3, but not 2 or 4.

Bonus question: what is the most efficient way to do such query (in case there are multiple alternatives)? Does it make sense to look into GIN over JSONB with pg_trgm? (There are millions of rows, the inner string values are typically 10-100 characters long, and each content list contains 0-1000s of lists (most usually 0).)

Thanks!

user124114
  • 8,372
  • 11
  • 41
  • 63

2 Answers2

3

You should split apart the top level arrays and check the elements from there:

select distinct id, content
FROM test
JOIN lateral (
    select elems 
    FROM jsonb_array_elements(content) jae(elems)
) all_arrays ON TRUE
WHERE elems ->> 0 = 'first 1'
and elems ->> 1 ilike '%3%'
ORDER BY 1;

As for the best way to do this, that depends a lot on your actual data - how many rows, how big these jsonb structures are, etc. In general, though, a search like ilike '%3%' will benefit from indexes based off of pg_trgm because they can't use traditional btree indexes.

Edit: @Abelisto's query in the comments is better because it should be more performant, especially if content can contain 1000s of elements:

select * from test 
where exists 
  (select 1 
   from jsonb_array_elements(content) jae(elems) 
   where elems ->> 0 = 'first 1' 
   and elems ->> 1 ilike '%3%'
  );
Jeremy
  • 6,313
  • 17
  • 20
  • 2
    Using `exists` could be shorter and more convenient: `select * from test where exists (select 1 from jsonb_array_elements(content) jae(elems) where elems ->> 0 = 'first 1' and elems ->> 1 ilike '%3%')` – Abelisto Jun 06 '19 at 19:35
  • Thanks Abelisto, I like that as well. – Jeremy Jun 06 '19 at 19:39
  • @Jeremy I updated the question with expected data sizes. Can you include an example of creating the right index for the second list element (the one queried with `ILIKE`)? – user124114 Jun 06 '19 at 19:41
  • @Abelisto "shorter" as in faster? Convenience is not a concern here. Correctness and performance is. – user124114 Jun 06 '19 at 19:45
  • 1
    @user124114 It could be faster because it removes two operations from the query: `join` and `distinct`. But to get really good performance you need to normalize your data structure. – Abelisto Jun 06 '19 at 19:50
  • @Abelisto normalize in what sense? – user124114 Jun 06 '19 at 20:07
  • 1
    @user124114 He means use single values per row instead of using jsonb. If your data were in columns of (id, first_val, second_val) you could index first_val with a btree index and pretty easily add the trgm index to second_val. – Jeremy Jun 06 '19 at 20:35
1

Inner select expands array elements into separate rows with jsonb_array_elements, outer select does the filtering you want. See SQL Fiddle for live example.

select * from (
select id, jsonb_array_elements(content) as item from test  
) as expandedtest
where item->>0 like 'first 1' and item->>1 like '%3%'
Anton
  • 3,587
  • 2
  • 12
  • 27