11

I'm trying to use the "newish" JSONB type.

I have a documents table with a properties jsonb field, and in that is a field publication_year. I want to find all document records within a year range e.g. 2013-2015. [EDIT: Querying for a range of values is the main challenge here, even though I have used an exact match example below. The requested approach would also apply for, say dollar ranges (price > $20 and price < $40) or timestamp ranges).]

I have tried:

create index test1 on documents using gin ((cast(properties->'announced_on_year' as integer)));

ERROR:  cannot cast type jsonb to integer

as well as:

create index test1 on documents using gin (cast(properties->>'publication_year' as integer));

ERROR:  data type integer has no default operator class for access method "gin"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.`

I saw from this post http://www.postgresql.org/message-id/10736.1409063604@sss.pgh.pa.us that this should be possible, but I can't figure out the right syntax.

When I just do a straightforward index:

create index test1 on documents using gin ((properties->'publication_year'));

an index is created, but I cannot query it using integer values to get a range going, it says

select count(*) from documents where properties->>'publication_year' = 2015;
ERROR:  operator does not exist: text = integer
LINE 1: ...*) from documents where properties->>'publication_year' = 2015;
                              ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

Any tips and hints highly appreciated. I'm sure others will benefit too. TIA

mkobit
  • 43,979
  • 12
  • 156
  • 150
Will Kessler
  • 565
  • 1
  • 7
  • 17

4 Answers4

6

I've found in my experiences that using GIN indexes on JSONB columns was not faster. You can just create a normal index by casting it to an integer

CREATE INDEX test1 ON documents ((properties->>'publication_year')::int);

Also, GIN has some limitations that should be considered before creating one. Even indexing the entire JSONB column can result in massive table-sized indexes.

This is based on my experience and looking through the Postgres documentation.

qwerty
  • 155
  • 3
  • 9
3

1) There are no GIN indexes for integer (at least not out of the box), use a btree.

create index test1 on documents using btree (cast (properties->>'announced_on_year' as int));

2) The error is pretty self-explanatory, cast the integer as text or use text for the comparison:

select count(*) from documents where properties->>'publication_year' = '2015';
Jakub Kania
  • 15,665
  • 2
  • 37
  • 47
  • Thank you, that is helpful, but it does not address my main problem: range querying (perhaps I should have been more explicit). The end goal is something like find all records where `publication_year` is > 2012 and < 2016. – Will Kessler Mar 22 '15 at 16:43
  • @WillKessler You can do that with a b-tree index. Have you tried? – Jakub Kania Mar 22 '15 at 17:01
  • I had tried, but was not getting the query right. But now with your help and murison, I've got it, see the comment above to murison's response. Thank you for your assistance however. – Will Kessler Mar 22 '15 at 22:24
2

You can cast as integer and use contrib/btree_gin extension.

create extension btree_gin;
create index tt_jb_int_idx on tt using gin( cast (jb->>'price' as int));
explain analyze select * from tt where cast(jb->>'price' as int)  > 3 and cast(jb->>'price' as int) > 'price'::text))::integer > 3) AND (((jb ->> 'price'::text))::integer   Bitmap Index Scan on tt_jb_int_idx  (cost=0.00..28.06 rows=6 width=0) (actual time=0.016..0.016 rows=1 loops=1)
         Index Cond: ((((jb ->> 'price'::text))::integer > 3) AND (((jb ->> 'price'::text))::integer 
1

Why don't you define an index for the whole jsonb field, as described in the doc?

create index test1 on documents using gin (properties);
mkobit
  • 43,979
  • 12
  • 156
  • 150
murison
  • 3,640
  • 2
  • 23
  • 36
  • Thank you-- I think this is the answer I was seeking. Even though the docs don't specify it, you can with this solution use CAST to get range results, as in: `EXPLAIN ANALYZE SELECT COUNT(*) FROM documents WHERE cast(properties->>'publication_year' AS integer) > 2012 AND cast(properties->>'publication_year' AS integer) < 2016; – Will Kessler Mar 22 '15 at 22:20
  • 1
    I am not sure, if using it this way will have an impact on performance (which I guess is the goal). According to the doc - the "->>" operator is nott supported by this type of index. Also - you can combine `x > A AND x < B` into `x BETWEEN A AND B` – murison Mar 23 '15 at 08:58