1

PostgreSQL version: 9.6.

The events table has a visitors JSONB column:

CREATE TABLE events (name VARCHAR(256), visitors JSONB);

The visitors column contains a "flat" JSON array:

["John Doe","Frédéric Martin","Daniel Smith",...].

The events table contains 10 million of rows, each row has between 1 and 20 visitors.

Is it possible to index the values of the array to perform efficient pattern-matching searches:

  1. left anchored: select events whose visitors match 'John%'
  2. right anchored: select events whose visitors match '%Doe'
  3. unaccented: select events whose visitors match 'Frederic%'
  4. case-insensitive: select events whose visitors match 'john%'

I am aware of the existence of the Postgres trigram extension gin_trgm_ops enabling to create indexes for case-insensitive and right-anchored searches, but I can't figure out how to create trigram indexes for the content of "flat" JSON arrays.

I read Pattern matching on jsonb key/value and Index for finding an element in a JSON array but the solutions provided do not seem to apply to my use case.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Cast it to `text` and use regular expressions with a trigram index. – Laurenz Albe Apr 03 '19 at 17:22
  • @Laurenz Albe thanks for the suggestion. I already tried to index the whole array as text (`create index events_visitors_trgm_idx on events using GIN ((visitors::text) gin_trgm_ops);`) but we cannot then perform searches such as `select * from events where visitors::text = 'John Doe'` because since the array is a flat text, we have to systematically use the `like '%John Doe%'` operator, which does not fit my use case. – Pierre Sion Apr 04 '19 at 08:29
  • That's why I suggested regular expressions. Search for `visitors ~ '\mJohn Doe\M'`. – Laurenz Albe Apr 04 '19 at 11:21
  • Thanks for clarifying things. This regular expression approach works well, it can be considered as an answer. – Pierre Sion Apr 04 '19 at 11:56

1 Answers1

1

You should cast the jsonb to text and create a trigram index on it:

CREATE EXTENSION pg_trgm;
CREATE INDEX ON events USING gin
   ((visitors::text) gin_trgm_ops);

Then use regular expression searches on the column. For example, to search for John Doe, you can use:

SELECT ...
FROM events
WHERE visitors::text *~ '\mJohn Doe\M';

The trigram index will support this query.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263