4

I have a PostgreSQL database table with text[] (array) columns defined on it. I'm using these columns to search for a specific record in the database in this way:

select obj from business
where ((('street' = ANY (address_line_1)
    and 'a_city' = ANY (city)
    and 'a_state' = ANY (state))
or    ('street' = ANY (address_line_1)
    and '1234' = ANY (zip_code)))
and ('a_business_name' = ANY (business_name)
    or 'a_website' = ANY (website_url)
    or array['123'] && phone_numbers))

The problem I'm having is that with about 1 million records, the query gets really slow. My question is simple, do array columns have different types of indexes?. Does anybody know the best type of index to create in this case? (Assuming there are different types).

Just in case, this is the explain analyze response:

"Seq Scan on business  (cost=0.00..207254.51 rows=1 width=32) (actual time=18850.462..18850.462 rows=0 loops=1)"
"  Filter: (('a'::text = ANY (address_line_1)) AND (('a'::text = ANY (business_name)) OR ('a'::text = ANY (website_url)) OR ('{123}'::text[] && phone_numbers)) AND ((('a'::text = ANY (city)) AND ('a'::text = ANY (state))) OR ('1234'::text = ANY (zip_code))))"
"  Rows Removed by Filter: 900506"
"Total runtime: 18850.523 ms"

Thanks in advance!

Beryllium
  • 12,808
  • 10
  • 56
  • 86
Sergio Ayestarán
  • 5,590
  • 4
  • 38
  • 62
  • As *always*: your version number of Postgres. The table definition and some sample data would also be very helpful, for instance as sqlfiddle. [Example.](http://sqlfiddle.com/#!12/15569/1) – Erwin Brandstetter Jan 28 '13 at 21:15

1 Answers1

3

You can use a GIN index to effectively help performance with arrays.
Use it in combination with array operators.

For instance:

CREATE INDEX business_address_line_1_idx ON business USING GIN (address_line_1);

Do that for all array columns involved in conditions.

It might be worth considering to normalize your schema instead. Maybe splitting up the multiple entries into a separate (1:n or n:m) table would serve you better. It often does in the long run, even if it seems like more work at first.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I up voted, especially regarding normalising the database. IMO, array fields should be avoided especially when used for selecting, like this – thaJeztah Jan 28 '13 at 21:39
  • Thank you Erwin, it worked like a charm:"Seq Scan on business (cost=0.00..206323.42 rows=1 width=32) (actual time=1259.974..1259.974 rows=0 loops=1)" " Filter: (('a'::text = ANY (address_line_1)) AND (('a'::text = ANY (business_name)) OR ('a'::text = ANY (website_url)) OR ('{123}'::text[] && phone_numbers)) AND ((('a'::text = ANY (city)) AND ('a'::text = ANY (state))) OR ('1234'::text = ANY (zip_code))))" " Rows Removed by Filter: 900537" "Total runtime: 1260.026 ms" – Sergio Ayestarán Jan 28 '13 at 21:56