3

I have 2 tables customer and coupons, a customer may or may not have a reward_id assigned to, so it's a nullable column. A customer can have many coupons and coupon belongs to a customer.

+-------------+------------+
|   coupons   | customers  |
+-------------+------------+
| id          | id         |
| customer_id | first_name |
| code        | reward_id  |
+-------------+------------+
customer_id column is indexed

I would like to make a join between 2 tables.

My attempt is:

select c.*, cust.id as cust_id, cust.first_name as cust_name
from coupons c
join customer cust
on c.customer_id = cust.id and cust.reward_id is not null

However, I think there isn't an index on reward_id, so I should move cust.reward_id is not null in where clause:

select c.*, cust.id as cust_id, cust.first_name as cust_name
from coupons c
join customer cust
on c.customer_id = cust.id
where cust.reward_id is not null

I wonder if the second attempt would be more efficient than the first attempt.

Cœur
  • 37,241
  • 25
  • 195
  • 267
tkhuynh
  • 941
  • 7
  • 15
  • Create the missing index instead of moving conditions around. The optimizer will move them anyway, if it sees a benefit from it. – sticky bit May 22 '18 at 23:07
  • @stickybit would the second query is a better choice assuming `reward_id` is indexed? – tkhuynh May 22 '18 at 23:12
  • There is a detailed discussion on what would be the diff in terms of Joins and Joins with where https://dba.stackexchange.com/a/3481 – Rizwan May 22 '18 at 23:28
  • @Rizwan thanks for your comment, but my question is a little different which focuses on the additional condition on `ON` clause vs having the additional condition in `WHERE` clause, and both queries using `JOIN` – tkhuynh May 22 '18 at 23:37
  • 1
    @tkhuynh . . . They should have the same execution plan. – Gordon Linoff May 23 '18 at 03:04
  • 1
    Both will result in exactly the same execution plan –  May 23 '18 at 05:22
  • @GordonLinoff so it means there is no differences in performance, am I right? I did a benchmark on the 2 queries, and I see the second one is slightly faster. – tkhuynh May 24 '18 at 21:15
  • 1
    @tkhuynh . . . That may be due to caching effects. Or if the difference is really small (say < 5%) then just random variation. – Gordon Linoff May 25 '18 at 03:01
  • This is an easily found faq & clearly so.. Before considering posting please read the manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. Reflect your research. See [ask] & the voting arrow mouseover texts. – philipxy Apr 26 '20 at 11:53
  • Does this answer your question? [SQL JOIN - WHERE clause vs. ON clause](https://stackoverflow.com/questions/354070/sql-join-where-clause-vs-on-clause) – philipxy Apr 27 '20 at 05:36
  • Learn basics of relational/SQL query optimization/implementation. These are trivially mapped to the same implementation by the simplest query engine. Also it would be helpful to think about why you think you have a reason to think your guesses are justified. (It's better when you say "I wonder", but "I wonder" isn't a question. However "efficient" doesn't mean anything in particular.) – philipxy Apr 28 '20 at 05:09

2 Answers2

1

It would be better if you see the execution plan on your own. Add EXPLAIN ANALYZE before your select statement and execute both to see the differences.

Here's how:

EXPLAIN ANALYZE select ...

What it does? It actually executes the select statement and gives you back the execution plan which was chosen by query optimizer. Without ANALYZE keyword it would only estimate the execution plan without actually executing the statement in the background.

Database won't use two indexes at one time, so having an index on customer(id) will make it unable to use index on customer(reward_id). This condition will actually be treated as a filter condition which is correct behaviour.

You could experiment with performance of a partial index created as such: customer(id) where reward_id is not null. This would decrease index size as it would only store these customer id's for which there is a reward_id assigned.

I generally like to split the relationship/join logic from conditions applied and I myself put them within the WHERE clause because it's more visible in there and easier to read for future if there are any more changes.

I suggest you see for yourself the possible performance gain, because it depends on how much data there is and the possible low cardinality for reward_id. For example if most rows have this column filled with a value it wouldn't make that much of a difference as the index size (normal vs partial) would be almost the same.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • I did use `EXPLAIN ANALYZE` to see the performance of the 2 queries and I notice that with a large number of records the second query perform better. However I still want to confirm whether moving `cust.reward_id is not null` to `WHERE` clause would be a better choice. – tkhuynh May 22 '18 at 23:55
0

In a PostgreSQL inner join, whether a filter condition is placed in the ON clause or the WHERE clause does not impact a query result or performance.

Here is a guide that explores this topic in more detail: https://app.pluralsight.com/guides/using-on-versus-where-clauses-to-combine-and-filter-data-in-postgresql-joins

Jacek Trociński
  • 882
  • 1
  • 8
  • 23
  • Theoretically yes, but in practice the optimizer returns different plans. Basic example here: https://stackoverflow.com/questions/65673841/sql-join-in-postgresql-different-execution-plan-in-where-clause-than-in-on-cla Any insight you might have would be appreciated. – jmiserez Feb 12 '21 at 09:38