-3

The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).

This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.

Here is the working but nonperformant query:

SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
    SELECT date_sold, sold_price, catalog_product_id, condition_id
    FROM sales
    WHERE invalid = 0 AND condition_id <= 3
    ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC

But like I said it's slow. The sales table has about 200k rows.

What I tried to do was create the subquery as a view, e.g.,

CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC

Then replace the subquery with the view, like

SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC

Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.

Why is it that the table view doesn't return the same result as the same select as a subquery?

After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!

Thanks for any and all help.

Willem Renzema
  • 5,177
  • 1
  • 17
  • 24
Kris
  • 201
  • 1
  • 8
  • 1
    In the absence of any aggregating functions, a GROUP BY clause is never appropriate. The rest is noise. – Strawberry Jul 04 '20 at 05:20
  • Also, just to observe, you have an outer join on a category table, from which you appear to select no columns. This seems entirely pointless, unless I'm missing something. – Strawberry Jul 04 '20 at 06:33
  • @Kris . . . Your query is malformed. You have an `order by` in a subquery -- and *you* probably think that does something useful (it does not). Your `select` columns are inconsistent with the `group by`. I would suggest that you ask a *new* question with sample data, desired results, and an explanation of what you are trying to accomplish. – Gordon Linoff Jul 04 '20 at 11:33
  • Please provide `EXPLAIN` for each case. – Rick James Jul 04 '20 at 14:46

1 Answers1

0

The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum

I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.

Kris
  • 201
  • 1
  • 8