0

I have no experience at all with sql/postgres indexing so I'm even unsure how to formulate my question correctly. But here goes...

In one of our JpaRepositories we have a query that looks like this:

    @Query(nativeQuery = true, value =
            "SELECT * FROM instrument i " +
                    "left join instrument_profile ip ON ip.instrument_id = i.id " +
                    "WHERE (ip.profile IS NULL or upper(ip.profile) = :assetMgr)" +
                    " and i.transaction_type = :transactionType "+
                    "  and i.customer_id = :customerId " +
                    "  and ( i.bloomberg_id ilike %:key% OR " +
                    "       i.instrument_short ilike %:key% or " +
                    "       i.instrument_name ilike %:key% OR " +
                    "       i.cusip ilike %:key% OR " +
                    "       i.sedol ilike %:key%) " +
                    "limit :limit")

The query works find but we are now looking for ways to optimize performance overall and table indexing is one of them. The thing is.. I have no clue how to index a table for this kind of query. Partly because it contains a join table, and also we are searching for a "search-key" value in multiple fields. This particular table is expected to contain a lot of records so if we cant create an index that supports the above, we could be forced to change the entire query.

Any suggestions or pointers on how to approach this would be very much appreciated.

/Kat

  • 4
    Limiting without ordering rarelt makes sense. – The Impaler Dec 15 '21 at 14:58
  • 1
    I don't know if Postgres implements index intersection but it's unlikely to work well with several ORd columns; other optimisations might be possible such as specific fitering for each column and unioning the resulting keys. – Stu Dec 15 '21 at 15:00
  • 1
    Check the extension pg_trgm to support an index on the ILIKE searches. https://www.postgresql.org/docs/14/pgtrgm.html#id-1.11.7.42.8 And use EXPLAIN(ANALYZE, VERBOSE, BUFFERS) to get the query plan and execution details about the query. Without that information it will be hard to optimise the query. – Frank Heikens Dec 15 '21 at 15:05
  • @Stu I'm not sure what you mean by "filtering each column". Can you describe this or send a link that explains this technique? – Katarina Stubberud Dec 15 '21 at 15:05
  • 1
    @KatarinaStubberud have a read of [https://stackoverflow.com/questions/13750475/sql-performance-union-vs-or](https://stackoverflow.com/questions/13750475/sql-performance-union-vs-or) – Stu Dec 15 '21 at 15:09
  • How many rows of i do you expect to match `i.transaction_type = :transactionType and i.customer_id = :customerId`? How many to match the parenthesized ORed ilike expression? – jjanes Dec 15 '21 at 17:47
  • @jjanes: transaction_type and customer_id will result in maybe 2-3 k up to 80-90k. It is the search key on the other fields that will limit number of hits to 1-2 rows – Katarina Stubberud Dec 15 '21 at 20:34
  • I am thinking of concatenating the search fields into a separate column that I can index with pg_trgm and use in query. At this point it seems like a way out – Katarina Stubberud Dec 15 '21 at 20:40

2 Answers2

1

The Join columns should be indexed to get better performance. You can try creating index on instrument_id in instrument_profile table and instrument table id column like below

CREATE INDEX instrument_profile_instrument_id_idx ON instrument_profile (instrument_id);
CREATE INDEX instrument_instrument_id_idx ON instrument (id);
JRA
  • 467
  • 5
  • 18
  • Dumb question but... I didnt mention that in the post but "id" on instrument table is a primary key. Is it necessary to create index for that then? – Katarina Stubberud Dec 15 '21 at 15:12
  • @KatarinaStubberud: Check your query plan first, before you start random indexing. And Yes, a primary key is already indexed. However, a foreign key is not. – Frank Heikens Dec 15 '21 at 15:18
  • @KatarinaStubberud A primary key is automatically indexed, no need to make a 2nd index on it. – jjanes Dec 15 '21 at 17:44
0

I am thinking of concatenating the search fields into a separate column that I can index with pg_trgm and use in query. At this point it seems like a way out

You could do that, but you could also just make a pg_trgm index on the 5 separate columns. I would tend to think this is better than an index on a concatenated field. You can write the queries in a more natural way, if you sometimes want you omit one of the columns you can do. It also saves space over duplicating the data in the table.

create index on instrument using gin (bloomberg_id gin_trgm_ops, instrument_short gin_trgm_ops, instrument_name gin_trgm_ops, cusip gin_trgm_ops, sedol gin_trgm_ops);

I would also create the regular (btree) index on the other 2 columns:

create index on instrument (transaction_type , customer_id);

Depending on the row estimates, it might decide to combine these two indexes with a BitmapAnd, or it might decide to just use one or the other and then filter on whatever criteria it choose not to use the index for.

jjanes
  • 37,812
  • 5
  • 27
  • 34