Optimising Hash And Hash Joins in SQL Query

Question

I have a bunch of tables in PostgreSQL and I run a query as follows:

SELECT DISTINCT ON ...some stuff... 
FROM "rent_flats" 
INNER JOIN "rent_flats_linked_users" 
  ON "rent_flats_linked_users"."rent_flat_id" = "rent_flats"."id" 
INNER JOIN "users" 
  ON "users"."id" = "rent_flats_linked_users"."user_id" 
INNER JOIN "owners" 
  ON "owners"."id" = "users"."profile_id" AND "users"."profile_type" = 'Owner' 
INNER JOIN "phone_numbers" 
  ON "phone_numbers"."person_id" = "owners"."id" AND "phone_numbers"."person_type" = 'Owner' 
INNER JOIN "phone_number_categories" 
  ON "phone_number_categories"."id" = "phone_numbers"."phone_number_category_id" 
INNER JOIN "localities" 
  ON "localities"."id" = "rent_flats"."locality_id" 
INNER JOIN "regions" 
  ON "regions"."id" = "localities"."region_id" 
INNER JOIN "cities" 
  ON "cities"."id" = "regions"."city_id" 
INNER JOIN "property_types" 
  ON "property_types"."id" = "rent_flats"."property_type_id" 
INNER JOIN "apartment_types" 
  ON "apartment_types"."id" = "rent_flats"."apartment_type_id" 
WHERE "rent_flats"."status" = 3 
  AND (((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period)) 
  AND (phone_number_categories.name IN ('SMS','SMS & Mobile')) 
ORDER BY rf_id, phone_numbers.priority ASC

Note: The rent_flats table contains around 5 million rows, and rent_flats_linked_users contains around 600k rows and users contains 350k rows.Other tables are small in size.

The query takes about 6.8 secs to execute and the explain analyses shows that around 99% of the total time goes in Hash and Hash joins.

On setting seq_scan to off...the query takes even longer to ~11 secs

Here's the explain query plan analyses. I have put indices on the fields involved in the inner joins as well as on fields involved in the filters like phone_numbers.priority and cities.short_period and cities.long_period. How can I optimise this further and reduce the Hash and Hash Joins times?

@DrewPierce No its in postgres...but aren't Hash and Hash joins in mysql too, and aren't the concepts same? — aceBox, Jul 27 '15 at 11:41
The best solution depends on the missing parts. My educated guess is you have `DISTINCT ON (rent_flats.id) rent_flats.id AS rf_id, ...` Burt why guess when you can just tell us? Please also provide other missing information as instructed in the tag info for [\[postgresql-performance\]](http://stackoverflow.com/tags/postgresql-performance/info). Most importantly your version of Postgres and the basic table definition of the three big tables. Also clarify if any of the joins can find 0 or more than 1 matches on the left side. — Erwin Brandstetter, Jul 27 '15 at 12:28
`(((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))` will not be able use any available indexes. (and causing a seqscan on rent_flats). Should probably be rewritten (by casting to date and substracting) — joop, Jul 27 '15 at 12:34

score 0 · Answer 1 · edited May 23 '17 at 11:43

0

Your second WHERE clause is not sargable:

 AND (((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))

If the involved columns are date and integer types (which we could see in the table definition), you can rewrite as:

AND rent_flats.date_added IN (current_date - cities.short_period - 1
                            , current_date - cities.long_period - 1)

Which an odd predicate. Are you sure you don't mean this?

AND rent_flats.date_added BETWEEN current_date - cities.short_period - 1
                              AND current_date - cities.long_period - 1

You can probably do a lot more, pending missing information. Most likely along these lines:

edited May 23 '17 at 11:43

Community

1
1

answered Jul 27 '15 at 12:43

Erwin Brandstetter

605,456
145
1,078
1,228

@joop: You already hinted in the same direction while I was still working on the rewritten condition. – Erwin Brandstetter Jul 27 '15 at 13:00
@ErwinBrandstetter That's good...Whatif date_added is a timestamp? – aceBox Jul 27 '15 at 13:20
1

@aceBox: Can be done easily. Depends on the other columns, too. Please provide the *complete* picture *in the question*. Iterating through bits in comments is too tedious. – Erwin Brandstetter Jul 27 '15 at 13:24
@ErwinBrandstetter thats fine...I did it...but the overall query time hasn't improved. Still around 99% time is spent in Hash & Hash Joins. – aceBox Jul 27 '15 at 13:31
1

@aceBox: As I said: `You can probably do a lot more, pending missing information.` – Erwin Brandstetter Jul 27 '15 at 13:57

Optimising Hash And Hash Joins in SQL Query

1 Answers1