I'm trying to index a large JSONB column based on a text field (with an ISO date string). This index works fine using = but is ignored if I use a > condition.
create table test_table (
id text NOT null primary key,
data jsonb,
text_test text
);
Then I add a bunch data to the jsonb column. And to ensure my JSON is valid, extract/copy the value I'm interested in from the JSONB column into another text column to test against too.
update test_table set text_test = (data->>'dueDate');
A quick sample shows it's good ISO formatted date strings:
select text_test, (data->>'dueDate') from test_table limit 1;
-- 2020-08-07T11:59:00 2020-08-07T11:59:00
I add btree indexes to both the JSONB and the text_test copy column. I tried adding one with explicit '::text' casting, as well as one with 'text_pattern_ops'.
create index test_table_duedate_iso on test_table using btree(text_test);
create index test_table_duedate_iso_jsonb on test_table using btree((data->>'dueDate'));
create index test_table_duedate_iso_jsonb_cast on test_table using btree(((data->>'dueDate')::text));
create index test_table_duedate_iso_jsonb_cast_pattern on test_table using btree(((data->>'dueDate')::text) text_pattern_ops);
Now if I query an exact value, explain shows it using the 'cast' version of the index. Good.
explain select * from test_table where (data->>'dueDate') = '2020-08-07T11:59:00';
"-> Bitmap Index Scan on test_table_duedate_iso_jsonb_cast (cost=0.00..10.37 rows=261 width=0)"
But if I try it with a >, it does a very slow full scan.
explain analyze select count(*) from test_table where (data->>'dueDate') > '2020-04-14';
--Aggregate (cost=10037.94..10037.95 rows=1 width=8) (actual time=1070.808..1070.813 rows=1 loops=1)
-- -> Seq Scan on test_table (cost=0.00..9994.42 rows=17409 width=0) (actual time=0.069..1057.258 rows=2930 loops=1)
-- Filter: ((data ->> 'dueDate'::text) > '2020-04-14'::text)
-- Rows Removed by Filter: 49298
--Planning Time: 0.252 ms
--Execution Time: 1070.874 ms
So just to check my sanity, I do the same query against the text_test column, it uses it's index as desired:
explain analyze select count(*) from test_table where text_test > '2020-04-14';
--Aggregate (cost=6037.02..6037.03 rows=1 width=8) (actual time=19.979..19.984 rows=1 loops=1)
-- -> Bitmap Heap Scan on test_table (cost=77.76..6030.14 rows=2754 width=0) (actual time=1.354..11.007 rows=2930 loops=1)
-- Recheck Cond: (text_test > '2020-04-14'::text)
-- Heap Blocks: exact=455
-- -> Bitmap Index Scan on test_table_duedate_iso (cost=0.00..77.07 rows=2754 width=0) (actual time=1.215..1.217 rows=2930 loops=1)
-- Index Cond: (text_test > '2020-04-14'::text)
--Planning Time: 0.145 ms
--Execution Time: 20.041 ms
I have also tested indexing a numerical field within the JSON and it actually works properly, using it's index for ranged type queries. So it's something about the text field or something I'm doing wrong with it.
PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14), 64-bit