2

I've got a PostgreSQL array of strings as a column in a table. I created an index using the GIN method. But ANY queries won't use the index (instead, they're doing a sequential scan of the whole table with a filter). What am I missing?

Here's my migration:

class CreateDocuments < ActiveRecord::Migration
  def up
    create_table :documents do |t|
      t.string :title
      t.string :tags, array: true, default: []
      t.timestamps
    end

    add_index :documents, :tags, using: 'gin'

    (1..100000).each do |i|
      tags = []
      tags << 'even' if (i % 2) == 0
      tags << 'odd' if (i % 2) == 1
      tags << 'divisible by 3' if (i % 3) == 0
      tags << 'divisible by 4' if (i % 4) == 0
      tags << 'divisible by 5' if (i % 5) == 0

      Document.create(
        title: i,
        tags: tags
      )
    end
  end

  def down
    drop_table :documents
  end
end

Here's my query, with the resulting number of rows.

Document.where("'divisible by 5' = ANY (tags)").explain
    Document Load (249.8ms)  SELECT "documents".* FROM "documents" WHERE ('divisible by 5' = ANY (tags))
    D, [2014-03-07T17:09:49.689709 #41937] DEBUG -- :   Document Load (249.8ms)  SELECT "documents".* FROM "documents" WHERE ('divisible by 5' = ANY (tags))
    => EXPLAIN for: SELECT "documents".* FROM "documents"  WHERE ('divisible by 5' = ANY (tags))
                       QUERY PLAN
    -----------------------------------------------------------------
    Seq Scan on documents  (cost=0.00..3500.00 rows=20057 width=69)
      Filter: ('divisible by 5'::text = ANY ((tags)::text[]))
    (2 rows)

Document.where("'divisible by 5' = ANY (tags)").length
    Document Load (258.0ms)  SELECT "documents".* FROM "documents" WHERE ('divisible by 5' = ANY (tags))
    D, [2014-03-07T17:09:55.536517 #41937] DEBUG -- :   Document Load (258.0ms)  SELECT "documents".* FROM "documents" WHERE ('divisible by 5' = ANY (tags))
    => 20000
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
mohith
  • 43
  • 1
  • 5
  • How many rows does the table `products` have? – Ihor Romanchenko Mar 07 '14 at 16:37
  • What is the cardinality of that column?How many unique records reported to the number of rows? – Mihai Mar 07 '14 at 16:43
  • @IgorRomanchenko The table has just over 100,000 rows – mohith Mar 07 '14 at 17:50
  • @Mihai There are over 100,000 unique rows. The column in question has over 18,000 unique values – mohith Mar 07 '14 at 17:52
  • Probably that ANY keyword make the optimizer choose a sequential scan as the best option.How many rows does that query returns? – Mihai Mar 07 '14 at 18:02
  • @Mihai It's above. 3,500 rows. – mohith Mar 08 '14 at 00:20
  • I revised the sample code above to make it even simpler. Any help would be greatly appreciated. – mohith Mar 08 '14 at 01:13
  • I would force the index using `enable_indexscan` and then measure performance versus the sequential scan. This will prove if you need the index or not for this use case, being that Postgresql has correctly chosen the most optimised path (index vs seq scan). – Rots Mar 08 '14 at 04:14

1 Answers1

5

To work with a GIN index use the <@ ("is contained by") operator instead of the ANY construct.

The manual states here that default GIN indexes currently only support these operators (additional functionality is shipped with extensions):

<@
@>
=
&&

So try this query:

Document.where("'{divisible by 5}' <@ tags").explain

Note that the left hand side needs to be in array notation, too, even if it's a single element. The operator <@ works for arrays. Hence '{divisible by 5}'.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228