0

I have a large database of products. It has a one to many relationship to another table of prices. I can easily get, with one query, the MIN, MAX and AVG of a particular category.

SELECT 
  MIN(gbp.price) AS min,
  ROUND(AVG(gbp.price),2) AS ave,
  MAX(gbp.price) AS max
FROM sku AS s
  INNER JOIN price gbp ON gbp.sid = s.id

However, I also want to be able to get the title of the product it relates to as well - I cannot get this resolved despite multiple searches and rewrites.

My data is similar to...

prod_id | title
===============
1       | prod1
2       | prod2
3       | prod3
4       | prod4
5       | prod5
6       | prod6
7       | prod7

price_id | prod_id | price | price_date
=======================================
1        | 1       | 2.99  | 2015/02/01
2        | 1       | 3.99  | 2015/02/12
3        | 2       | 12.99 | 2015/02/01
4        | 3       | 15.99 | 2015/02/01
5        | 4       | 29.99 | 2015/02/01
6        | 5       | 29.99 | 2015/02/01
7        | 5       | 24.99 | 2015/02/12
8        | 6       | 2.99  | 2015/02/01
9        | 7       | 99.99 | 2015/02/01
10       | 7       | 89.99 | 2015/02/12

I am going to presume that other people may want a query writing similar to this, so I am going to ask for two answers.

First one "simply" to return this...

min  | min_title | ave   | max   | max_title
============================================
2.99 | prod1     | 31.39 | 99.99 | prod7 

However, the real answer I want (despite the fact I cannot even solve the above) is where it gets even trickier.

The actual results I want is in the table below...

min  | min_title | ave   | max   | max_title
============================================
2.99 | prod6     | 25.85 | 89.99 | prod7 

The min is 2.99 for prod6 as the 2.99 price for prod1 has expired.

The max is 89.99 for prod7 as the 99.99 price for prod7 has expired.

The ave is 25.85 because of above and because the price for prod5 is 24.99.

I am not expecting answers for everything, just answering the first question (in bold) will likely lead me to the answer for the second part (as I have similar queries that get the latest price etc).

themrflibble
  • 303
  • 3
  • 10

3 Answers3

0

To solve your first output just use join to get those values:

SELECT min, mint.title, ave, max, maxt.title
FROM (
    SELECT 
      MIN(gbp.price) AS min,
      ROUND(AVG(gbp.price),2) AS ave,
      MAX(gbp.price) AS max
    FROM (SELECT price 
          FROM price AS gbp 
          INNER JOIN sku s2 ON gbp.sid = s2.id 
          ORDER BY prdate DESC 
          LIMIT 0, 1) AS s
    INNER JOIN price gbp ON gbp.sid = s.id
) inq
JOIN price minp ON inq.min = minp.price
JOIN price maxp on inq.max = maxp.price
JOIN prod mint ON minp.prod_id = mint.prod_id
JOIN prod maxt ON maxp.prod_id = maxt.prod_id

I don't understand the rules for your second output.

Hogan
  • 69,564
  • 10
  • 76
  • 117
  • When the prices aren't unique this is going to cause a mess. – shawnt00 Feb 13 '15 at 18:09
  • @shawnt00 - no mess... more than one row. This is expected, you have more than one result. – Hogan Feb 13 '15 at 18:10
  • When you have 2 products with min price and then 3 with max price you'll get 6 rows total with all pairs between them. – shawnt00 Feb 13 '15 at 18:22
  • He wants to exclude expired prices in the second output, I believe. – shawnt00 Feb 13 '15 at 18:37
  • Forgetting the fact I want to forget the expired prices, once I use a GROUP BY this one works. Without the GROUP BY I get repeated results where they are all the same. So I am happy to state that round one (my question in bold) WORKS! (My pizza is now done, will return shortly) – themrflibble Feb 13 '15 at 18:55
  • expired prices? this might just be an additional where clause to the sub-query (add on `WHERE `) to the query. – Hogan Feb 13 '15 at 18:57
  • I'm pretty sure the expired price part is going to make the query much, much more complicated. I have solved it on another query with this `(SELECT price FROM price AS gbp INNER JOIN sku s2 ON gbp.sid = s2.id ORDER BY prdate DESC LIMIT 0, 1)`. This will need to be done for every `sku` in the table to correctly work out the average, min and max. – themrflibble Feb 13 '15 at 19:43
  • @themrflibble - just replace `FROM sku AS s` with this query enclosed in `()` -- I made this change in my answer, I also changed the subquery alias to not be a reserved word – Hogan Feb 13 '15 at 20:03
  • Thank you for your perseverance @Hogan - I simplified the data and the query and when I use the modification above for the master, I get no results. Again, I understand the logic you have used and it looks like the right way to do things. I will try and track what is causing this and give some more details. – themrflibble Feb 13 '15 at 20:49
  • My gut is telling me that the LIMIT 0,1 part is causing it to fail. This should be the way to get the data for the latest price, but I think it's just stopping it working at all. – themrflibble Feb 13 '15 at 20:59
  • `LIMIT 0,1` is the same as `LIMIT 1` and should be fine, this will give you the first row which you said is the valid rule for getting current items. – Hogan Feb 13 '15 at 21:01
  • I think I have improved on it by some of the suggestions you made. I have ended up with the from being `FROM (SELECT gbp2.price, gbp2.prdate, s.* FROM price AS gbp INNER JOIN (SELECT id, sid, price, prdate FROM price ORDER BY prdate DESC) AS gbp2 USING (sid) INNER JOIN sku s ON s.id = gbp.sid GROUP BY sid) AS s`. This negates the need for multiple joins as this table has all the data I require, and makes sure that the most current date is the one being calculated. It's pretty quick compare to other options to. I couldn't have got to this without your suggestions, to which you have my thanks. – themrflibble Feb 13 '15 at 21:58
0

This is essentially two different queries (or three if you count the average). The cross join is just horizontally splicing together the two results for min and max. They could all obviously be separated and executed individually.

with current_prices as (
    select price_id, prod_id, price
    from prices
    where price_date = (
        select max(price_date)
        from prices as prices2
        where prices2.prod_id = prices.prod_id
    )
),
min_current_prices as (
    select price, min(prod_id) as prod_d /* arbitrary selected representative */
    from current_prices
    where price = (
        select min(price)
        from current_prices
    )
    group by price
),
max_current_prices as (
    select price, min(prod_id) as prod_id /* arbitrary selected representative */
    from current_prices
    where price = (
        select max(price)
        from current_prices
    )
    group by price
)
select
    m1.price, prod1.title,
    (select avg(price) from current_prices) as ave,
    m2.price, prod2.title
from
    min_current_prices as m1 inner join products as prod1 on prod1.prod_id = m1.prod_id
    max_current_prices as m2 inner join products as prod2 on prod2.prod_id = m2.prod_id

I feel that this seems too complicated and yet you're asking for something very unusual. There clearly could be products with the same min/max price so this is going to cause problems when there is more than one at either end.

If your platform doesn't support WITH then just substitute the full query instead:

select
    min_current_price.price as min_price, min_prod.title as min_title,
    (
        select avg(price)
        from prices
        where price_date = (
            select max(price_date)
            from prices as prices2
            where prices2.prod_id = prices.prod_id
        )
    ) as ave,
    max_current_price.price as max_price, max_prod.title as max_title
from
(
    select price, min(prod_id) as prod_id /* arbitrarily selected representative */
    from (
        select *
        from prices
        where price_date = (
            select max(price_date)
            from prices as prices2
            where prices2.prod_id = prices.prod_id
            )
        ) as current_prices
    where price = (
        select min(price)
        from prices
        where price_date = (
            select max(price_date)
            from prices as prices2
            where prices2.prod_id = prices.prod_id
            )
        )
    group by price
) as min_current_price

cross join

(
    select price, min(prod_id) as prod_id /* arbitrarily selected representative */
    from (
        select *
        from prices
        where price_date = (
            select max(price_date)
            from prices as prices2
            where prices2.prod_id = prices.prod_id
            )
        ) as current_prices
    where price = (
        select max(price)
        from prices
        where price_date = (
            select max(price_date)
            from prices as prices2
            where prices2.prod_id = prices.prod_id
            )
        )
    group by price
) as max_current_price

inner join products as min_prod on min_prod.prod_id = min_current_price.prod_id
inner join products as max_prod on max_prod.prod_id = max_current_price.

Here's a hack for doing it in mysql using limits and sorting methods:

select
    minprice.price as min_price, minprod.title as min_title,
    (
        select avg(price)
        from prices
        where price_date = (
            select max(price_date)
            from prices as prices2
            where prices2.prod_id = prices.prod_id
        )
    ) as ave,
    maxprice.price as max_price, maxprod.title as max_title
from
(
    select price_id, price, prod_id
    from prices
    where not exists ( /* another way of excluding expired prices */
        select 1 from prices as p2
        where p2.prod_id = prices.prod_id and p2.price_date > prices.prod_id
    )
    order by price asc
    limit 0, 1
) as minprice,

(
    select price_id, price, prod_id
    from prices
    where not exists (
        select 1 from prices as p2
        where p2.prod_id = prices.prod_id and p2.price_date > prices.prod_id
    )
    order by price desc
    limit 0, 1
) as maxprice

inner join prod as minprod on minprod.prod_id = minprice.prod_id
inner join prod as maxprod on min.prod_id = maxprice.prod_id
shawnt00
  • 16,443
  • 3
  • 17
  • 22
  • WOW! This is why this website is awesome. You have just showed me a function that I did not know existed nor how to use it. The with statement could have cracked it, and the above looks like it may well do the trick. I am going to test all of the above (there's a couple to go through now) and see which one gets the best result (and cook my dinner). Can't thank you enough!! – themrflibble Feb 13 '15 at 18:23
  • there is a reason you don't know about it. `with` is not supported in mysql. http://stackoverflow.com/questions/324935/mysql-with-clause – Hogan Feb 13 '15 at 18:25
  • I wasn't sure about mysql support. – shawnt00 Feb 13 '15 at 18:26
  • @shawnt00 - nor is row_number() -- both would make this question easier to answer. – Hogan Feb 13 '15 at 18:28
  • You no doubt took a long time to come up with this solution, however, I just keep getting errors and there's so many nested subqueries, I'm finding it difficult to know where the error lies. I also suspect that this query would take a long time to run, and therefore, even though it could work, the time it would take to run may mean I just need to think of a simpler way and then do a completely new AJAX query to find these other bits out? – themrflibble Feb 13 '15 at 21:24
  • I had screwed up some of the parentheses. The newer approaches to these kinds of problem are to use analytic functions. If you want to break it down into pieces you could certainly temp tables. I don't think the execution plan is going to be too bad since I think the optimizer will recognize the repeated subqueries. This last version seemed to be working in the sqlfiddle I tested on so hopefully you have more confidence in it. – shawnt00 Feb 13 '15 at 22:01
0
SELECT t1.min, s.title AS min_title, t1.ave, t1.max, s2.title AS max_title
FROM (SELECT 
        MIN(gbp.price) AS min,
        ROUND(AVG(gbp.price),2) AS ave,
        MAX(gbp.price) AS max
    FROM sku AS s
        INNER JOIN price gbp ON (gbp.sid = s.id)
) t1
  INNER JOIN (SELECT gbp.price, MAX(gbp.prod_id) AS MaxProdID
    FROM price gbp
    WHERE NOT EXISTS(
        SELECT p2.price_id 
        FROM price p2 
        WHERE p2.price_id > gbp.price_id 
        AND p2.prod_id = gpb.prod_id
    )
    GROUP BY gbp.price
  ) minprice ON (minprice.price = t1.min)
  INNER JOIN sku s ON (s.id = minprice.MaxProdID)
  INNER JOIN (SELECT gbp.price, MAX(gbp.prod_id) AS MaxProdID
    FROM price gbp
    WHERE NOT EXISTS(
        SELECT p2.price_id 
        FROM price p2 
        WHERE p2.price_id > gbp.price_id 
        AND p2.prod_id = gpb.prod_id
    )
    GROUP BY gbp.price
  ) maxprice ON (maxprice.price = t1.max)
  INNER JOIN sku s2 ON (s2.id = maxprice.MaxProdID)
arnoudhgz
  • 1,284
  • 9
  • 17
  • This is the one out of the replies, I tested first. This all seems easier to understand to me, however after integrating the above with my big db I got quite spurious results. To note, there's a few typos above (minprice table having MAX in it, the last inner join I changed s.id = max... to s2.id = max... etc.), however the jist of what you meant was understood and I changed accordingly. So, bad news is, not working for me however I will have a go at expanding upon this. Thanks. – themrflibble Feb 13 '15 at 18:40
  • Yeah, sorry I missed the expiration date part, so I got the max prod_id for the price, so the MAX in the minprice table was intended to be so....btw is the price_date a date column? if so the `t1` subsquery can be smarter – arnoudhgz Feb 13 '15 at 18:47
  • Yes, the `price_date` is datetime. I suppose there's two ways to deal with it... either whichever id is greater is the newest price, or whichever date is the greater is the newer price. – themrflibble Feb 13 '15 at 18:58
  • I've added a `NOT EXISTS`, please try – arnoudhgz Feb 13 '15 at 19:10