108

I have a table that looks like this:

CREATE TABLE tracks (id SERIAL, artists JSON);

INSERT INTO tracks (id, artists) 
  VALUES (1, '[{"name": "blink-182"}]');

INSERT INTO tracks (id, artists) 
  VALUES (2, '[{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]');

There's several other columns that aren't relevant to this question. There's a reason to have them stored as JSON.

What I'm trying to do is lookup a track that has a specific artist name (exact match).

I'm using this query:

SELECT * FROM tracks 
  WHERE 'ARTIST NAME' IN
    (SELECT value->>'name' FROM json_array_elements(artists))

for example

SELECT * FROM tracks
  WHERE 'The Dirty Heads' IN 
    (SELECT value->>'name' FROM json_array_elements(artists))

However, this does a full table scan, and it isn't very fast. I tried creating a GIN index using a function names_as_array(artists), and used 'ARTIST NAME' = ANY names_as_array(artists), however the index isn't used and the query is actually significantly slower.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
JeffS
  • 2,647
  • 2
  • 19
  • 24
  • I've made a follow up question based on this one: http://dba.stackexchange.com/questions/71546/index-for-finding-an-element-in-a-json-array-in-postgresql-with-multiple-json-f – Ken Li Jul 15 '14 at 11:37

1 Answers1

194

jsonb in Postgres 9.4+

The binary JSON data type jsonb largely improves index options. You can now have a GIN index on a jsonb array directly:

CREATE TABLE tracks (id serial, artists jsonb);  -- !
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);

No need for a function to convert the array. This would support a query:

SELECT * FROM tracks WHERE artists @> '[{"name": "The Dirty Heads"}]';

@> being the jsonb "contains" operator, which can use the GIN index. (Not for json, only jsonb!)

Or you use the more specialized, non-default GIN operator class jsonb_path_ops for the index:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING  gin (artists jsonb_path_ops);  -- !

Same query.

Currently jsonb_path_ops only supports the @> operator. But it's typically much smaller and faster. There are more index options, details in the manual.


If the column artists only holds names as displayed in the example, it would be more efficient to store just the values as JSON text primitives and the redundant key can be the column name.

Note the difference between JSON objects and primitive types:

CREATE TABLE tracks (id serial, artistnames jsonb);
INSERT INTO tracks  VALUES (2, '["The Dirty Heads", "Louis Richards"]');

CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);

Query:

SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';

? does not work for object values, just keys and array elements.

Or:

CREATE INDEX tracks_artistnames_gin_idx ON tracks
USING  gin (artistnames jsonb_path_ops);

Query:

SELECT * FROM tracks WHERE artistnames @> '"The Dirty Heads"'::jsonb;

More efficient if names are highly duplicative.

json in Postgres 9.3+

This should work with an IMMUTABLE function:

CREATE OR REPLACE FUNCTION json2arr(_j json, _key text)
  RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';

Create this functional index:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING  gin (json2arr(artists, 'name'));

And use a query like this. The expression in the WHERE clause has to match the one in the index:

SELECT * FROM tracks
WHERE  '{"The Dirty Heads"}'::text[] <@ (json2arr(artists, 'name'));

Updated with feedback in comments. We need to use array operators to support the GIN index.
The "is contained by" operator <@ in this case.

Notes on function volatility

You can declare your function IMMUTABLE even if json_array_elements() isn't wasn't.
Most JSON functions used to be only STABLE, not IMMUTABLE. There was a discussion on the hackers list to change that. Most are IMMUTABLE now. Check with:

SELECT p.proname, p.provolatile
FROM   pg_proc p
JOIN   pg_namespace n ON n.oid = p.pronamespace
WHERE  n.nspname = 'pg_catalog'
AND    p.proname ~~* '%json%';

Functional indexes only work with IMMUTABLE functions.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    This doesn't work because returning `SETOF` can't be used in an index. Removing that, I can create the index, however it isn't used by the query planner. Also, both json_array_elements and array_agg are `IMMUTABLE` – JeffS Aug 23 '13 at 17:21
  • @ErwinBrandstetter Is this still the best way to go with the release of Postgres 9.4? – Tony Dec 30 '14 at 01:05
  • @Tony: No, there are better ways now. I would use `jsonb` and a native GIN index in pg 9.4. Added a chapter to the answer. – Erwin Brandstetter Dec 30 '14 at 01:50
  • @ErwinBrandstetter How does the index know that name is nested in an array? I'm trying to do something similar here: http://stackoverflow.com/questions/26499266/whats-the-proper-index-for-querying-structures-in-arrays-in-postgres-json but can't get my queries to use the index – Tony Dec 30 '14 at 02:01
  • 2
    @Tony: Sorry, I was mixing column name and key name. Fixed and added more. – Erwin Brandstetter Dec 30 '14 at 12:37
  • Strangely I need to add [] so it work, And it never use the index, I would love to find out why! ````SELECT * FROM tracks WHERE artists @> '[{"name": "The Dirty Heads"}]';```` – PyWebDesign Apr 23 '15 at 16:44
  • 1
    @PyWebDesign: jsonb containment queries generally must match the same structure as the containing object (so searching for an object inside an array means you must query using an object inside an array). There is a special exception for primitive types inside an array; more details here: http://stackoverflow.com/a/29947194/818187 – potatosalad Apr 29 '15 at 14:58
  • 3
    @PyWebDesign: I see now, the array layer was missing in one example. Fixed. The index is only going to be used in a table big enough so that it's cheaper for Postgres than a sequential scan. – Erwin Brandstetter Apr 29 '15 at 18:03
  • @ErwinBrandstetter, Thanks this is exactly what I was searching for, do you know if I can force it to use the index, maybe juste once, so I can confirm it work. – PyWebDesign Apr 29 '15 at 23:07
  • 3
    @PyWebDesign: Run in your session `SET enable_seqscan = off;` (for debugging purposes only) http://stackoverflow.com/questions/14554302/postgres-query-optimization-forcing-a-index-scan/14555618#14555618. – Erwin Brandstetter Apr 29 '15 at 23:37