0
SELECT DISTINCT options.id, options.foo_option_id, options.description
FROM vehicles 
INNER JOIN vehicle_options     ON vehicle_options.vehicle_id = vehicles.id 
INNER JOIN options             ON options.id = vehicle_options.option_id 
INNER JOIN discounted_vehicles ON vehicles.id = discounted_vehicles.vehicle_id 
WHERE discounted_vehicles.discount_id = 4;

The query above returns me 2067 rows, and it runs locally in 1.7 seconds. I'm wondering if it is as fast as it can be or if I can tweak it further somehow, as this dataset will grow fast overtime.

Things that I've tried without change in speed:

1 - Change the join order, joining from the smallest to the biggest table.

2 - Adding an index to discounted_vehicles.discount_id.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
fabriciofreitag
  • 2,843
  • 22
  • 26
  • 1
    Check the [explain plan](https://www.postgresql.org/docs/9.4/static/using-explain.html), and see if you can add any indexes, or whether any query changes will help – Sualeh Fatehi May 04 '18 at 01:09
  • 2
    You can't optimize a query without having an estimate of the sizes of the data sets, and once you have an estimate, the best way to check the performance is to fill up tables to that size and see what happens. It depends on your hardware and heavily on *what* the data is. If most rows only join to one row in the other table, that's a hugely different case than if most rows join to 200 other rows. The bottom line is that the best advice anyone can give you is to profile it and stop guessing. You can also set concrete performance goals (expected time of completion) and see if you're meeting them – jpmc26 May 04 '18 at 02:07
  • Please provide information as described in the [tag info of \[postgresql-performance\]](https://stackoverflow.com/tags/postgresql-performance/info). – Erwin Brandstetter May 04 '18 at 02:29

4 Answers4

7

The best query depends on missing information.
This should be substantially faster in a typical setup:

SELECT id, foo_option_id, description
FROM   options o
WHERE  EXISTS (
   SELECT
   FROM   discounted_vehicles d
   JOIN   vehicle_options vo USING (vehicle_id)
   WHERE  d.discount_id = 4
   AND    vo.option_id = o.id
   );

Assuming referential integrity, enforced by FK constraints, we can omit the table vehicle from the query and join from discounted_vehicles to vehicle_options directly.

Also, EXISTS is typically faster if there are many qualifying rows per distinct option.

Ideally, you'd have multicolumn indexes on:

discounted_vehicles(discount_id, vehicle_id)
vehicle_options(vehicle_id, option_id)

Index columns in this order. You probably have a PK constraint on vehicle_options providing the 2nd index, but the column order should match. Related:

Depending on actual data distribution, there may be faster query styles. Related:

Changing the join order is typically futile. Postgres reorders joins any way it expects to be fastest. (Exceptions apply.) Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

1 - Change the join order, joining from the smallest to the biggest table.

Behind the scenes PostgreSQL rearranges the order of the tables according to the explain plan the SQL optimizer designs. The order you wrote is not meaningful.

2 - Adding an index to discounted_vehicles.discount_id.

It depends on how selective the discount_id column is. Do you think it will filter out 95% of the rows leaving only 5%? If it leaves 5% or less, an index will help. Otherwise a full table scan is faster.

Additionally, if it's not there yet, I would add the index:

vehicle_options (vehicle_id)

But maybe it's already created by the foreign key.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
0

Try using groupby instead of distinct

SELECT 
    "options"."id",
    "options"."foo_option_id",
    "options"."description"
FROM
    "vehicles" 
    INNER JOIN "vehicle_options" ON "vehicle_options"."vehicle_id" = "vehicles"."id" 
    INNER JOIN "options" ON "options"."id" = "vehicle_options"."option_id" 
    INNER JOIN "discounted_vehicles" ON "vehicles"."id" = "discounted_vehicles"."vehicle_id" 
WHERE 
    "discounted_vehicles"."discount_id" = 4 
GROUP BY 
    "options.id";
Alan
  • 1,378
  • 2
  • 19
  • 24
0

Although, you need to create necessary indexes before that try to run below query

SELECT "options"."id", "options"."foo_option_id",
    "options"."description"
  FROM "vehicles" 
  INNER JOIN "vehicle_options" 
    ON "vehicle_options"."vehicle_id" = "vehicles"."id" 
  INNER JOIN "options" 
    ON "options"."id" = "vehicle_options"."option_id" 
  INNER JOIN "discounted_vehicles" 
    ON "vehicles"."id" = "discounted_vehicles"."vehicle_id" 
  WHERE "discounted_vehicles"."discount_id" = 4
  GROUP BY options"."id", "options"."foo_option_id",
    "options"."description"
Rajat Jaiswal
  • 645
  • 4
  • 15