4

I have a products table and a sales table that keeps record of how many items a given product sold during each date. Of course, not all products have sales everyday.

I need to generate a report that tells me how many consecutive days a product has had sales (from the latest date to the past) and how many items it sold during those days only.

I'd like to tell you how many things I've tried so far, but the only succesful (and slow, recursive) ones are solutions inside my application and not inside SQL, which is what I want.

I also have browsed several similar questions on SO but I haven't found one that lets me have a clear idea of what I really need.

I've setup a SQLFiddle here to show you what I'm talking about. There you will see the only query I can think of, which doesn't give me the result I need. I also added comments there showing what the result of the query should be.

I hope someone here knows how to accomplish that. Thanks in advance for any comments!

Francisco

Francisco Zarabozo
  • 3,676
  • 2
  • 28
  • 54

3 Answers3

3

http://sqlfiddle.com/#!2/20108/1

Here is a store procedure that do the job

CREATE PROCEDURE myProc()
BEGIN
    -- Drop and create the temp table
    DROP TABLE IF EXISTS reached;
    CREATE TABLE reached (
    sku CHAR(32) PRIMARY KEY,
    record_date date,
    nb int,
    total int)
   ENGINE=HEAP;

-- Initial insert, the starting point is the MAX sales record_date of each product
INSERT INTO reached 
SELECT products.sku, max(sales.record_date), 0, 0
FROM products
join sales on sales.sku = products.sku
group by products.sku;

-- loop until there is no more updated rows
iterloop: LOOP
    -- Update the temptable with the values of the date - 1 row if found
    update reached
    join sales on sales.sku=reached.sku and sales.record_date=reached.record_date
    set reached.record_date = reached.record_date - INTERVAL 1 day, 
        reached.nb=reached.nb+1, 
        reached.total=reached.total + sales.items;

    -- If no more rows are updated it means we hit the most longest days_sold
    IF ROW_COUNT() = 0 THEN
        LEAVE iterloop;
    END IF;
END LOOP iterloop;

-- select the results of the temp table
SELECT products.sku, products.title, products.price, reached.total as sales, reached.nb as days_sold 
from reached
join products on products.sku=reached.sku;

END//

Then you just have to do

call myProc()
Benos
  • 676
  • 7
  • 17
  • Wow, thank you. I never imagined it would take that much, but it's definitely faster and better than processing it at application level. Thank you very much for taking the time to write this answer. :-) – Francisco Zarabozo Apr 26 '13 at 07:39
  • There is maybe a faster/cleaner way but I was having nearly the same problem 2 weeks ago and wrote a procedure for it. It was easy to adapt it for your question – Benos Apr 26 '13 at 07:41
3

A solution in pure SQL without store procedure : Fiddle

SELECT sku
     , COUNT(1) AS consecutive_days
     , SUM(items) AS items
FROM
(
  SELECT sku
       , items
       -- generate a new guid for each group of consecutive date
       -- ie : starting with day_before is null
       , @guid := IF(@sku = sku and day_before IS NULL, UUID(), @guid) AS uuid
       , @sku := sku AS dummy_sku
  FROM 
  (
    SELECT currents.sku
         , befores.record_date as day_before
         , currents.items
    FROM sales currents
      LEFT JOIN sales befores 
        ON currents.sku = befores.sku 
        AND currents.record_date = befores.record_date + INTERVAL 1 DAY
    ORDER BY currents.sku, currents.record_date
  )  AS main_join
    CROSS JOIN (SELECT @sku:=0) foo_sku
    CROSS JOIN (SELECT @guid:=UUID()) foo_guid
) AS result_to_group
GROUP BY uuid, sku

The query is really not that hard. Declare variables via cross join (SELECT @type:=0) type. Then in the selects, you can set variables value row by row. It is necessary for simulating Rank function.

Cyril Gandon
  • 16,830
  • 14
  • 78
  • 122
  • Wow. I'll have to research a little to understand your query. This will be the first time I use things like `items, @guid := IF(...` and also `cross join`. Thank you, this seems a very aproach. :-) – Francisco Zarabozo Apr 26 '13 at 08:37
  • @FranciscoZarabozo : is is not that hard. Test each subquery and watch the result, then you could understand easily what happens. – Cyril Gandon Apr 26 '13 at 08:40
  • I really like this approach, what about performance vs the store procedure if the sales table grow a lot with time ? Your statement is getting all the chunks of consecutive dates which is great but can be a problem on a big table. Anyway thanks a lot for your query I learned a lot analysing it – Benos Apr 26 '13 at 09:01
-1
select
  p.*,
  sum(s.items) sales,
  count(s.record_date) days_sold
from
  products p
join
  sales s
  on
  s.sku = p.sku
where record_date between '2013-04-18 00:00:00' and '2013-04-26 00:00:00'
group by sku; 
aleroot
  • 71,077
  • 30
  • 176
  • 213