1
table: users
columns: id

table: products
columns: id

table: product_images
columns: product_id |  path

table: cart
columns: user_id | product_id 

product_id in the bottom 2 tables corresponds to id in the products table. user_id in the cart table corresponds to id in the users table.

My friend and I are developing a generic shopping website just for fun.

The requirement: Given a user_id, go through the user's shopping cart and return all corresponding rows in the products table and with exactly 1 image for the product (there may be multiple images).

This is our query (which seems to work):

SELECT products.*, product_images.path
FROM products
INNER JOIN cart 
ON cart.product_id = products.id AND cart.user_id = 13  /* arbitrary id */
LEFT OUTER JOIN product_images 
ON product_images.product_id = cart.product_id
GROUP BY cart.product_id

The first join is intuitive to me because both tables involved in the join are joining on columns within each table.

It is the 2nd join that I am confused about.
My understanding is that the first join will produce a virtual table. This virtual table is then joined with the product_images table, but the confusing part is that the ON condition is not using a column belonging directly to the virtual table.

So what exactly is going on here?

Please note that I know how to rewrite the query in a way that is more intuitive to me, and that it is understanding the concept that is important to me.

At first I thought this was uncommon, but I noticed w3schools does the same thing at the bottom of this link: http://www.w3schools.com/sql/sql_groupby.asp

Kacy
  • 3,330
  • 4
  • 29
  • 57

1 Answers1

1

cart and products are inner joined.

The virtual table going on to the next join includes all columns from cart and all columns from products.

This virtual/logical table will be as below.

SELECT products.id,
       cart.user_id,
       cart.product_id
FROM   products
       INNER JOIN cart
         ON cart.product_id = products.id
            AND cart.user_id = 13;

So there are three columns in this virtual table. products.id,cart.user_id,cart.product_id

This virtual table is then left joined onto product_images using a column from product_images and cart.product_id from the virtual table.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Sorry for the late response. I've been reading and rereading what you've said while trying out different (but similar) queries. I don't understand what you mean by "This is then left joined onto product_images using a column from product_images and a column from cart." My understanding is that the right table involved in this left join will consist of all columns and rows of the product_images table minus any rows that do not have a corresponding product_id in the cart table. Assuming that's true, I still don't understand how the virtual table is joined w this new product_images table. – Kacy Jan 01 '15 at 20:22
  • 1
    @KacyRaye - The left table involved in that join is `products * carts` so that brings into scope all columns from products and carts. The right table involved is `product_images` so that brings into scope columns from `product_images`. So the left join predicate can reference columns from any of those three tables. – Martin Smith Jan 01 '15 at 20:25
  • 1
    @KacyRaye And your question says "the confusing part is that the ON condition is not using a column belonging directly to the virtual table." So rows aren't relevant here at all. If you are confused about the result in general see [my answer here](http://stackoverflow.com/questions/38549/difference-between-inner-and-outer-joins/27458534#27458534). The explanation for left outer join is exactly the same it is just that the table on the left is not a base table but is the result of the query in my answer. – Martin Smith Jan 01 '15 at 20:34
  • I went through all the examples you provide and they all make perfect sense, but I feel like this is a separate case. Oh, I think it may have just clicked. Can you confirm? The ON condition for the left join is actually using the product_id column in the virtual table and not some separate instance of the original cart table? – Kacy Jan 01 '15 at 21:00
  • 1
    @KacyRaye - Yes it's using the product_id column from the virtual table (which of course is derived from `cart` originally). – Martin Smith Jan 01 '15 at 21:02
  • Wow, that was the little detail I was missing. Thank you! – Kacy Jan 01 '15 at 21:05
  • 1
    @KacyRaye - I think my answer could have been clearer on that point. Hopefully it is now! – Martin Smith Jan 01 '15 at 21:06
  • Hey Martin, can I ask you something kind of related? Is the Group By using cart.product_id in the virtual table? Also, is the Group By being applied immediately after the last join, or is it applied after the Select completes? – Kacy Jan 01 '15 at 22:24
  • 1
    @KacyRaye - **Logically** the `GROUP BY` happens after the joins and where clause if present. So it would be using the `cart.product_id` column in the virtual table resulting from the second join. The `SELECT` logically runs almost last. The only step that happens after that is `ORDER BY` and `LIMIT`. – Martin Smith Jan 01 '15 at 22:33