0

I have a rather complex query I need some help with.

basically my table has four fields:

Start Date,
End Date, 
Item, 
Type

I also need to get the number of days

In order to group them correctly, I need to look at each row, and check if the dates in Start Date Are consecutive, if they are group them, if not it goes into a different group

So if the following dates were in the database:

2013-10-23
2013-10-24
2013-10-28
2013-10-29

Then it should return two rows

2(number of days), 2013-10-23 (startdate) ,2013-10-24 (last consecutive date) , Item, Type
2,2013-10-28,2013-10-29,Item,Type

Then to take things to a more complex level, the grouping needs to be based on the Item and Type also being the same

So that is the following data (StartDate,Item,Type)

2013-10-23,ABC,EFG
2013-10-24,XYZ,WXY
2013-10-28,ABC,EFG
2013-10-29,ABC,EFG

Then the first two would not be grouped because the Item and/or Type are not the same as each other, even though the dates are consecutive.

However the last two would group together because the dates are consecutive and the Item and Type are the same as each other, with wanted result:

number of days  startdate   last consecutive date  Item  Type
   1            2013-10-23  2013-10-23             ABC   EFG
   1            2013-10-24  2013-10-24             XYZ   WXY
   2            2013-10-28  2013-10-29             ABC   EFG

I am trying to do this with MySQL, I know I can loop through with the aid of PHP, but it is preferable to do it in MySQL if it is at all possible.

Here is a SQLFiddle with some data in it. http://sqlfiddle.com/#!2/63383/1/0

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
Lawrence Cooke
  • 1,567
  • 3
  • 26
  • 52
  • can you post table structure with some sample data on http://sqlfiddle.com – Muhammad Raheel Oct 24 '13 at 05:56
  • This should be relatively easy just doing a join based on StartDate = StartDate + 1 and the item and type fields are the same. However how do you want to cope with multiple consecutive dates? Also what is it that you want to group up (a count?) – Kickstart Oct 24 '13 at 08:45
  • multiple consecutive dates should be grouped together with the other consecutive dates , so if there was 3 consecutive dates, they should all be grouped together, (as long as item and type are the same). yes the other is a count – Lawrence Cooke Oct 24 '13 at 15:29
  • 4
    This a "gaps and islands" problem. See this similar question: **[MySQL group by number series](http://stackoverflow.com/questions/19285596/mysql-group-by-number-series)**. – ypercubeᵀᴹ Oct 26 '13 at 08:48
  • there's a lot of consecutive day posts (or gaps/islands).... [like this](http://stackoverflow.com/a/9278514/623952) and all you need to do is also join by your type and item fields, too... [this post](http://stackoverflow.com/questions/1176011/sql-to-determine-minimum-sequential-days-of-access) is _very intersting_ – gloomy.penguin Oct 26 '13 at 09:04
  • It's unclear what the exact result of the query should be. What do you mean by `the first two would not be grouped` ? Does it mean that the query is expected to return only **one** row: `2,2013-10-28,2013-10-29,ABC,EFG` from the sample data ? – krokodilko Oct 26 '13 at 09:17

3 Answers3

2
SELECT COUNT(*), MIN(startdate), MAX(startdate), item, type FROM (
  SELECT   startdate, item, type,
           @group     := @group + 1 - (
                           type      <=> @last_type
                       AND item      <=> @last_item
                       AND startdate <=> @last_date + INTERVAL 1 DAY
                         ) g,
           @last_type := type,
           @last_item := item,
           @last_date := startdate
  FROM     productinfo, (
             SELECT @group     := 0,
                    @last_type := NULL,
                    @last_item := NULL,
                    @last_date := NULL
           ) init
  ORDER BY type, item, startdate
) t GROUP BY g

See it on sqlfiddle.

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • can you please let me know what does `<=>` stand for? – Akash Oct 26 '13 at 13:41
  • @Akash: [`<=>`](http://dev.mysql.com/doc/en/comparison-operators.html#operator_equal-to) is the `NULL`-safe equality operator. That is, it is the same as `=`, except that it evaluates to `FALSE` if one operand is `NULL` and `TRUE` if both operands are `NULL`. – eggyal Oct 26 '13 at 15:14
  • Thank you for this, this got me going in the right direction of what i needed. I will post the actual query i ended up with, but this was certainly enough to help me though it. much appreciated – Lawrence Cooke Oct 26 '13 at 20:24
  • just for my own eduction, theselect group :=0,last_type:-NULL, this part is just setting default values for those variables? does the "init" word have significance here? also in the top part, group := @group + 1 - (), is the part in brackets are comparison, eg, it ends up being 1 or 0? 1 if a match is found 0 if it isnt? I think i am clear about the rest of it, just never used this method before and would like to completely understand what I am doing here. – Lawrence Cooke Oct 27 '13 at 08:46
  • 1
    @CharlieSmith: Yes, that's correct - that subquery is initialising the user variables; the `init` table alias has no special meaning. The boolean expression (in parentheses) will evaluate to 1 if the expression is true and 0 if the expression is false; thus `@group+1-(...)` will equal `@group` if the expression is true and `@group+1` if the expression is false. One could make that more explicit using [`IF()`](http://dev.mysql.com/doc/en/control-flow-functions.html#function_if) or [`CASE ... END`](http://dev.mysql.com/doc/en/control-flow-functions.html#operator_case) if so desired. – eggyal Oct 27 '13 at 08:54
0

Try this:

SELECT
    DATEDIFF(MAX(startdate), MIN(startdate)) + 1 AS number_of_days,
    MIN(startdate) AS startdate,
    MAX(startdate) AS last_consecutive_date,
    item,
    type
  FROM (
    SELECT @rownum := @rownum + 1 AS row_number,
           pi.*
    FROM
      productinfo pi,
      (SELECT @rownum := 0) r
  ) mydata
GROUP BY item, type, DATE_ADD(startdate, INTERVAL -row_number DAY)
ORDER BY item, type, DATE_ADD(startdate, INTERVAL -row_number DAY)
;

Based on the answer by Andriy M in this thread.

SQLFiddle: SQLFiddle example

Community
  • 1
  • 1
Przemyslaw Kruglej
  • 8,003
  • 2
  • 26
  • 41
0
SELECT DATEDIFF(max(enddate),min(startdate)) + 1 as days, MIN(startdate) as start, MAX(enddate) as end, type, item FROM (
  SELECT   startdate, enddate,item, type,
       @group     := @group + 1 - (
                       type      <=> @last_type
                   AND item      <=> @last_item
                   AND startdate <=> @last_date + INTERVAL 1 DAY
                     ) g,
       @last_type := type,
       @last_item := item,
       @last_date := enddate
  FROM     productinfo, (
         SELECT @group     := 0,
                @last_type := NULL,
                @last_item := NULL,
                @last_date := NULL
       ) init
  ORDER BY type, item, startdate
) t GROUP BY g order by start
Lawrence Cooke
  • 1,567
  • 3
  • 26
  • 52
  • 1
    If you're going to use `enddate` in this fashion (which was not mentioned in your question), you will also need to consider what happens if periods overlap. – eggyal Oct 26 '13 at 20:59
  • You are quite right, I was not completely clear on that part, if was a difficult issue to explain properly. In this particular case, there wont be any overlap due to the data that is going in, but I can see how that would certainly need to be a consideration. Again, that you for your help. and why does stackoverflow force me to wait x hours before awarding bounty when the solution has been given? – Lawrence Cooke Oct 27 '13 at 00:53
  • See [What is the reasoning behind having a delay before you can award a bounty?](http://meta.stackexchange.com/q/145963) – eggyal Oct 27 '13 at 00:56