1

I have a bunch of tables in PostgreSQL and I run a query as follows:

SELECT DISTINCT ON ...some stuff... 
FROM "rent_flats" 
INNER JOIN "rent_flats_linked_users" 
  ON "rent_flats_linked_users"."rent_flat_id" = "rent_flats"."id" 
INNER JOIN "users" 
  ON "users"."id" = "rent_flats_linked_users"."user_id" 
INNER JOIN "owners" 
  ON "owners"."id" = "users"."profile_id" AND "users"."profile_type" = 'Owner' 
INNER JOIN "phone_numbers" 
  ON "phone_numbers"."person_id" = "owners"."id" AND "phone_numbers"."person_type" = 'Owner' 
INNER JOIN "phone_number_categories" 
  ON "phone_number_categories"."id" = "phone_numbers"."phone_number_category_id" 
INNER JOIN "localities" 
  ON "localities"."id" = "rent_flats"."locality_id" 
INNER JOIN "regions" 
  ON "regions"."id" = "localities"."region_id" 
INNER JOIN "cities" 
  ON "cities"."id" = "regions"."city_id" 
INNER JOIN "property_types" 
  ON "property_types"."id" = "rent_flats"."property_type_id" 
INNER JOIN "apartment_types" 
  ON "apartment_types"."id" = "rent_flats"."apartment_type_id" 
WHERE "rent_flats"."status" = 3 
  AND (((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period)) 
  AND (phone_number_categories.name IN ('SMS','SMS & Mobile')) 
ORDER BY rf_id, phone_numbers.priority ASC

Note: The rent_flats table contains around 5 million rows, and rent_flats_linked_users contains around 600k rows and users contains 350k rows.Other tables are small in size.

The query takes about 6.8 secs to execute and the explain analyses shows that around 99% of the total time goes in Hash and Hash joins.

On setting seq_scan to off...the query takes even longer to ~11 secs

Here's the explain query plan analyses. I have put indices on the fields involved in the inner joins as well as on fields involved in the filters like phone_numbers.priority and cities.short_period and cities.long_period. How can I optimise this further and reduce the Hash and Hash Joins times?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
aceBox
  • 711
  • 2
  • 13
  • 30
  • @DrewPierce No its in postgres...but aren't Hash and Hash joins in mysql too, and aren't the concepts same? – aceBox Jul 27 '15 at 11:41
  • The best solution depends on the missing parts. My educated guess is you have `DISTINCT ON (rent_flats.id) rent_flats.id AS rf_id, ...` Burt why guess when you can just tell us? Please also provide other missing information as instructed in the tag info for [\[postgresql-performance\]](http://stackoverflow.com/tags/postgresql-performance/info). Most importantly your version of Postgres and the basic table definition of the three big tables. Also clarify if any of the joins can find 0 or more than 1 matches on the left side. – Erwin Brandstetter Jul 27 '15 at 12:28
  • `(((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))` will not be able use any available indexes. (and causing a seqscan on rent_flats). Should probably be rewritten (by casting to date and substracting) – joop Jul 27 '15 at 12:34

1 Answers1

0

Your second WHERE clause is not sargable:

 AND (((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period)) 

If the involved columns are date and integer types (which we could see in the table definition), you can rewrite as:

AND rent_flats.date_added IN (current_date - cities.short_period - 1
                            , current_date - cities.long_period - 1)

Which an odd predicate. Are you sure you don't mean this?

AND rent_flats.date_added BETWEEN current_date - cities.short_period - 1
                              AND current_date - cities.long_period - 1

You can probably do a lot more, pending missing information. Most likely along these lines:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228