2

Currently I'm writing queries against a JSONB table with 8 million+ rows. How can I query from the parent and the friends objects in the most efficient manner possible?

Query (Postgres 9.6):

select distinct id, data->>'_id' jsonID, data->>'email' email, friends->>'name' friend_name, parent->>'name' parent
from temp t
CROSS JOIN jsonb_array_elements(t.data->'friends') friends
CROSS JOIN jsonb_array_elements(friends->'parent') parent
where friends ->> 'name' = 'Chan Franco'
and parent->>'name' = 'Hannah Golden'

Example DDL (with data): https://pastebin.com/byN7uyKx

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
sfasu77
  • 35
  • 5
  • Please put all defining information into your question. Your version of Postgres, table definition, sample value. (Lengthy test case can stay at the external site - a fiddle site like dbfiddle.uk or sqlfiddle.com would be better suited for this.) – Erwin Brandstetter Nov 06 '18 at 20:59
  • 1
    A more efficient way would be to normalize your model. http://blog.2ndquadrant.com/postgresql-anti-patterns-unnecessary-jsonhstore-dynamic-columns/ –  Nov 06 '18 at 21:30
  • Thanks for the replies, I'll be looking to normalize this data model! – sfasu77 Nov 06 '18 at 22:01

1 Answers1

1

Your regularly structured data would be cleaner, smaller and faster as normalized relational design.

That said, to make the setup you have much faster (if not as fast as a normalized design with matching indexes), add a GIN index on the expression data->'friends':

 CREATE INDEX tbl_data_friends_gin_idx ON tbl USING gin ((data->'friends'));

Then add a matching WHERE clause to our query with the contains operator @>:


SELECT DISTINCT  -- why DISTINCT ?
       id, data->>'_id' AS json_id, data->>'email' AS email
     , friends->>'name' AS friend_name, parent->>'name' AS parent
FROM   tbl t
CROSS  JOIN jsonb_array_elements(t.data->'friends') friends
CROSS  JOIN jsonb_array_elements(friends->'parent') parent
WHERE  t.data->'friends' @> '[{"name": "Chan Franco", "parent": [{"name": "Hannah Golden"}]}]'
AND    friends->>'name' = 'Chan Franco'
AND    parent ->>'name' = 'Hannah Golden';

db<>fiddle here

The huge difference: With the help of the index, Postgres can now identify matching rows before unnesting each an every nested "friends" array in the whole table. Only after having identified matching rows in the underlying table, jsonb_array_elements() is called and resulting rows with qualifying array elements are kept.

Note that the search expression has to be valid JSON, matching the structure of the JSON array data->'friends' - including the outer brackets []. But omit all key/value pairs that are not supposed to serve as filter.

Related:

I avoided the table name temp as this is an SQL key word, that might lead to confusing errors. Using the name tbl instead.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228