18

This may be a very simplistic question, so apologies in advance, but I am very new to database usage.

I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.

I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.

William Jones
  • 18,089
  • 17
  • 63
  • 98

1 Answers1

14

Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.

You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.

You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.

Magnus Hagander
  • 23,890
  • 5
  • 56
  • 43
  • 3
    Can you go into detail about how the read query would work? I'm trying to find documentation on the best way to do postgresql full-text search across multiple tables and I'm not finding much. One problem I ran into is that if I setup an index on each of the tables that I was querying, postgresql wasn't able to use all the indexes when I did a query. Like: select * from orders left join on users... left join on line_items... where ts_vector('english', orders.id) @@ ... or ts_vector('english', users.name) @@ ... – Joe Van Dyk May 10 '11 at 23:32
  • 2
    I also see that a query searching in two tables does a sequential scan (`SELECT COUNT(*) FROM products WHERE to_tsvector('simple', products.name::text) @@ to_tsquery('simple', 'foo'::text);`), whereas a query searching one table only does a bitmap index scan using the index (`SELECT COUNT(*) FROM products LEFT JOIN brands ON products.brand_id = brands.id WHERE to_tsvector('simple', products.name::text) @@ to_tsquery('simple', 'foo'::text) OR to_tsvector('simple', brands.name::text) @@ to_tsquery('simple', 'foo'::text);`). – wvengen Dec 20 '16 at 10:33