8

I have this table structure:

EDIT more complex example: add hidden range

category|   day      |   a   |
--------|------------|-------|
1       | 2012-01-01 |   4   |
1       | 2012-01-02 |   4   |
1       | 2012-01-03 |   4   |
1       | 2012-01-04 |   4   |
1       | 2012-01-05 |   5   |
1       | 2012-01-06 |   5   |
1       | 2012-01-07 |   5   |
1       | 2012-01-08 |   4   |
1       | 2012-01-09 |   4   |
1       | 2012-01-10 |   4   |
1       | 2012-01-11 |   5   |
1       | 2012-01-12 |   5   |
1       | 2012-01-16 |   5   |
1       | 2012-01-17 |   5   |
1       | 2012-01-18 |   5   |
1       | 2012-01-19 |   5   |
...

with 'category-day' as unique keys. I would extract a range of dates, for each category, according with column "a" and given limit range, like so:

1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5

or similar.

I search the best way for do it. Using only mysql preferably but also with a little bit of php.

NOTE1: not all day are inserted: between two days non-contiguos could not be other days. In this case I would in output the missed range with column "a" = 0.

NOTE2: I did it with a simple query and some rows of php but I don't like it because my simple algorithm need a cycle for each day in range multiplied for each category found. If range is too big and there are too much categories, that's not so good.

FINAL EDIT: OK! After reading all comments and answers, I think not exists a valid, efficient and, at same time, readable solution. So Mosty Mostacho answer is a no 100% valid solution, but it has 100% valid suggestions. Thank you all.

hakre
  • 193,403
  • 52
  • 435
  • 836
Luca Rainone
  • 16,138
  • 2
  • 38
  • 52
  • What would be value for `a`, lets say if choose range `2012-01-01` to `2012-01-09` – safarov Mar 30 '12 at 15:20
  • @safarov 1,2012-01-01|2012-01-04,4 1,2012-01-05|2012-01-07,5 1,2012-01-08|2012-01-09,4 – Luca Rainone Mar 30 '12 at 15:29
  • OK now i understand what you want, buts its really hard to do it with only mysql, half of it can be easly done in php – safarov Mar 30 '12 at 15:38
  • I think you need declare procedure and use loop for check each subgroup. I cant think any other option – safarov Mar 30 '12 at 15:53
  • It's okay also with php, but with an efficient algorithm in case of large range and many categories. – Luca Rainone Mar 30 '12 at 15:55
  • 1
    I think you example is wrong. You meant `2012-01-12,5` for the last value, right? – Mosty Mostacho Apr 06 '12 at 18:51
  • Will this query always need to run for all categories, for all dates between (min(day) from table) and (max(day) from table)? How many categories are/will there be? So in three years time will you be running this for 109500 records (3yrs * 365days * 100cats)? – user1191247 Apr 07 '12 at 19:36
  • @nnichols yes. However this is a simplified table of real question. The real table have also an object_id (for each object_id there are N category, with 1 – Luca Rainone Apr 08 '12 at 08:58

8 Answers8

5

New edit:

As I told you in a comment, I strongly recommend you to use the quick query and then process the missing dates in PHP as that would be faster and more readable:

select
  concat(@category := category, ',', min(day)) col1,
  concat(max(day), ',', @a := a) col2
from t, (select @category := '', @a := '', @counter := 0) init
where @counter := @counter + (category != @category or a != @a)
group by @counter, category, a

However, if you still want to use the query version, then try this:

select
  @counter := @counter + (category != @category or a != @a) counter,
  concat(@category := category, ',', min(day)) col1,
  concat(max(day), ',', @a := a) col2
from (
  select distinct s.day, s.category, coalesce(t1.a, 0) a
  from (
    select (select min(day) from t) + interval val - 1 day day, c.category
    from seq s, (select distinct category from t) c
    having day <= (select max(day) from t)
  ) s
  left join t t1 on s.day = t1.day and s.category = t1.category
  where s.day between (
    select min(day) from t t2
    where s.category = t2.category) and (
    select max(day) from t t2
    where s.category = t2.category)
  order by s.category, s.day
) t, (select @category := '', @a := '', @counter := 0) init
group by counter, category, a
order by category, min(day)

Note that MySQL won't allow you to create data on the fly, unless you hardcode UNIONS, for example. This is an expensive process that's why I strongly suggest you to create a table with only an integer field with values from 1 to X, where X is, at least the maximum amount of dates that separate the min(day) and max(day) from your table. If you're not sure about that date, just add 100,000 numbers and you'll be able to generate range periods for over 200 years. In the previous query, this table is seq and the column it has is val.

This results in:

+--------------+--------------+
|     COL1     |     COL2     |
+--------------+--------------+
| 1,2012-01-01 | 2012-01-04,4 |
| 1,2012-01-05 | 2012-01-07,5 |
| 1,2012-01-08 | 2012-01-10,4 |
| 1,2012-01-11 | 2012-01-12,5 |
| 1,2012-01-13 | 2012-01-15,0 |
| 1,2012-01-16 | 2012-01-19,5 |
+--------------+--------------+

Ok, I'm lying. The result is actually returning a counter column. Just disregard it, as removing it (using a derived table) would be even less performant!

Community
  • 1
  • 1
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
  • 1
    It's good! Thank you. However there is also the NOTE1: it's possibile that some middle days for some category are not inserted. In this case the "not inserted range" should be with column "a" equal 0. I've updated example. (Your answer is for now the best and I will accept it anyway if is not possible with last edit.) – Luca Rainone Apr 07 '12 at 07:30
  • @chumkiu You said the pairs `category-day` are unique and I can have missing days. Does that mean I can have different (and same) days missing for different categories? EG: In your example, can I have `2012-01-01` missing in category `2`? – Mosty Mostacho Apr 07 '12 at 16:34
  • @chumkiu Also note that generating data in a DBMS is not usually a good idea. It will be extremely slow and usually unnecessary as it is actually a data presentation issue. Although it is possible to do so, my advice would be to handle the missing data in a loop in PHP, keeping track of the previous `a`, `category` and `day` so that if the current fetched `day` is not the following of the previous `day` then you have a gap from the previous `day + 1` to the current `day`. IMO, this is by far the best/fastest/most readble solution. If you insist on the query, then answer the preivous question:) – Mosty Mostacho Apr 07 '12 at 19:41
  • Yes. I can have missing day for category `2` but the same day can be inserted for category `1`. I've already suspected your reflection. I asked here for confirm. – Luca Rainone Apr 09 '12 at 09:14
2

and here's a one liner brutality for you :) (Note: Change the "datt" table name.)

select dd.category,
dd.day as start_day,
(select dp.day from 
    (
        select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
            select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
        )
        union
        select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
            select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
        )
    ) dp where dp.day >= dd.day - INTERVAL (n-2) DAY order by day asc limit 0,1) 
as end_day,
dd.a from (
    select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
        select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
    )
    union
    select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
        select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
    )
) dd
where n=1

and it's output is :

|| 1 || 2012-01-01 || 2012-01-01 || 4 ||
|| 1 || 2012-01-03 || 2012-01-04 || 4 ||
|| 1 || 2012-01-05 || 2012-01-07 || 5 ||
|| 1 || 2012-01-08 || 2012-01-10 || 4 ||
|| 1 || 2012-01-11 || 2012-01-12 || 5 ||

Note: Thats the result for non-existing 2012-01-02 in a 01-12 day table.

Taha Paksu
  • 15,371
  • 2
  • 44
  • 78
  • it works, thank you. Your answer could also solve the `note1` with a fix. See updated example :) – Luca Rainone Apr 07 '12 at 07:59
  • wait, I didn't see the last sentence. :S – Taha Paksu Apr 07 '12 at 08:05
  • the last I could get. It gets the existing ranges but the missing ranges are not so easy because you need to compare with a full day table list to get the missing ones. Thats for PHP side I think. – Taha Paksu Apr 07 '12 at 08:30
2

No need for PHP or temporary tables or anything.

DISCLAIMER: I did this just for fun. This stunt may be too crazy to be used in a production environment. Therefore I'm not posting this as a "real" solution. Also I'm not willing to explain how it works :) And I didn't rethink / refactor it. There might be more elegant ways and names / aliases could be more informative. So please no flame or anything.

Here's my solution. Looks more complicated than it is. I think it may be easier to understand than other answers, no offense :)

Setting up test data:

drop table if exists test;
create table test(category int, day date, a int);
insert into test values
(1       , '2012-01-01' ,   4   ),
(1       , '2012-01-02' ,   4   ),
(1       , '2012-01-03' ,   4   ),
(1       , '2012-01-04' ,   4   ),
(1       , '2012-01-05' ,   5   ),
(1       , '2012-01-06' ,   5   ),
(1       , '2012-01-07' ,   5   ),
(1       , '2012-01-08' ,   4   ),
(1       , '2012-01-09' ,   4   ),
(1       , '2012-01-10' ,   4   ),
(1       , '2012-01-11' ,   5   ),
(1       , '2012-01-12' ,   5   ),
(1       , '2012-01-16' ,   5   ),
(1       , '2012-01-17' ,   5   ),
(1       , '2012-01-18' ,   5   ),
(1       , '2012-01-19' ,   5   );

And here it comes:

SELECT category, MIN(`day`) AS firstDayInRange, max(`day`) AS lastDayInRange, a
, COUNT(*) as howMuchDaysInThisRange /*<-- as a little extra*/
FROM
(
SELECT 
IF(@prev != qr.a, @is_a_changing:=@is_a_changing+1, @is_a_changing) AS is_a_changing, @prev:=qr.a, qr.* /*See if column a has changed. If yes, increment, so we can GROUP BY it later*/
FROM
(
SELECT 
test.category, q.`day`, COALESCE(test.a, 0) AS a /*When there is no a, replace NULL with 0*/
FROM
test
RIGHT JOIN
(
SELECT
DATE_SUB(CURDATE(), INTERVAL number_days DAY) AS `day` /*<-- Create dates from now back 999 days. This query is surprisingly fast. And adding more numbers to create more dates, i.e. 10000 dates is also no problem. Therefor a temporary dates table might not be necessary?*/
FROM
(
SELECT (a + 10*b + 100*c) AS number_days FROM
  (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) aa
, (SELECT 0 AS b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) bb
, (SELECT 0 AS c UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) cc
)sq /*<-- This generates numbers 0 to 999*/
)q USING(`day`) 
, (SELECT @is_a_changing:=0, @prev:=0) r
/*This WHERE clause is just to beautify. It may not be necessary*/
WHERE q.`day` >= (SELECT MIN(test.`day`) FROM test) AND q.`day` <= (SELECT MAX(test.`day`) FROM test) 
)qr
)asdf
GROUP BY is_a_changing
ORDER BY 2

Result looks like this:

category    firstDayInRange     lastDayInRange      a   howMuchDaysInThisRange
--------------------------------------------------------------------------
1           2012-01-01          2012-01-04          4   4
1           2012-01-05          2012-01-07          5   3
1           2012-01-08          2012-01-10          4   3
1           2012-01-11          2012-01-12          5   2
            2012-01-13          2012-01-15          0   3
1           2012-01-16          2012-01-19          5   4
fancyPants
  • 50,732
  • 33
  • 89
  • 96
  • There are basically 2 problems with creating the calendar table on the fly istead of having it as a table in your db: 1) You have a limit of 1000 days which is less than 3 years 2) Increasing that number will make things exponentially slower 3) If you have to run this more than once, then you would have created the derived table many times, which is even less efficient. Btw, I removed your "optional" `where` clause and got a 1000 day period :) By far, the best solution is querying with a one-pass query the whole data and creating the missing data in PHP. – Mosty Mostacho Apr 12 '12 at 15:40
  • @MostyMostacho 1) True 2) Like I said in the comments to the query, I think it's very, very fast 3) Really, I don't see a problem here. If your server gets a problem with such a query, even IF you have to run this more than once, then you better rethink your setup. And when you get a 1000 day period, I don't know, maybe something's wrong with your test data? Using the data given in the question (and my answer) this works. The "optional" where clause filters out the rows for date range prior 2011-12-31 and after 2012-01-20. Like it says, "just to beautify". – fancyPants Apr 13 '12 at 07:54
  • +1 nice query :). But it's more slow than my initial solution (simple query mysql and some rows of php). So I can't accept it. – Luca Rainone Apr 15 '12 at 17:26
1

Firstly, this is an extension of @Mosty's solution.

To enable Mosty's solution to include category/date combinations than do not exist in the table I took the following approach -

Start by getting a distinct list of categories and then join this to the entire date range -

SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`

The above query builds the full date range using the table dummy with a single field id. The id field contains 0,1,2,3,.... - it needs to have enough values to cover every day in the required date range. This can then be joined back to the original table to create a complete list of all categories for all dates and the appropriate value for a -

SELECT cj.category, cj.`day`, IFNULL(t.a, 0) AS a
FROM (
    SELECT category, `start` + INTERVAL id DAY AS `day`
    FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
    WHERE id <= DATEDIFF(`end`, `start`)
    ORDER BY category, `day`
) AS cj
LEFT JOIN t
    ON cj.category = t.category
    AND cj.`day` = t.`day`

This can then be applied to Mosty's query in place of table t -

SELECT
    CONCAT(@category := category, ',', MIN(`day`)) col1,
    CONCAT(MAX(`day`), ',', @a := a) col2
FROM (
    SELECT cj.category, cj.day, IFNULL(t.a, 0) AS a
    FROM (
        SELECT category, `start` + INTERVAL id DAY AS `day`
        FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
        WHERE id <= DATEDIFF(`end`, `start`)
        ORDER BY category, `day`
    ) AS cj
    LEFT JOIN t
        ON cj.category = t.category
        AND cj.`day` = t.day) AS t, (select @category := '', @a := '', @counter := 0) init
WHERE @counter := @counter + (category != @category OR a != @a)
GROUP BY @counter, category, a
user1191247
  • 10,808
  • 2
  • 22
  • 32
  • Mmmh it not works properly. It exclude first day of first range and it exclude some range after an empty range. I still not understand the logic of this error. – Luca Rainone Apr 08 '12 at 17:38
  • It will only exclude the first day if the id range does not contain 0 as the first value. You will have to provide a specific example of where you think it is failing. If you run the individual parts of the query separately you should be able to understand what it is doing. If not, please ask specific questions. – user1191247 Apr 08 '12 at 18:16
  • I should make a temporary table for table dummy? – Luca Rainone Apr 09 '12 at 09:10
  • 1
    If you don't have another table with a contiguous set covering the required range of values (0,1,2,3,...4999,5000 would be useful) then just create another table. Creating it as a temporary table each time you need it would be an unnecessary overhead. You should also try Mosty's suggestion of writing a similar solution in PHP instead of SQL. – user1191247 Apr 09 '12 at 09:21
1

Completely on mysql side will have performance adv: Once the procedure has been created, it runs within 0.35 - 0.37 sec

create procedure fetch_range()
begin
declare min date;
declare max date;

create  table testdate(
    date1 date
);

select min(day) into min
from category;

select max(day) into max
from category;

while min <= max do

insert into testdate values(min);
set min = adddate(min,1);
end while;

select concat(category,',',min(day)),concat(max(day),',',a) 
from(
SELECT if(isNull(category),@category,category) category,if(isNull(day),date1,day) day,@a,if(isNull(a) || isNull(@a),if(isNull(a) && isNull(@a),@grp,@grp:=@grp+1),if(@a!=a,@grp:=@grp+1,@grp)) as sor_col,if(isNull(a),0,a) as a,@a:=a,@category:= category
FROM  `category` 
RIGHT JOIN testdate ON date1 = category.day) as table1
group by sor_col;

drop table testdate;

end 

o/p:

1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5

Here is mysql solution which will give the desired result excluding the missed range only.

PHP: The missing range can be added through php.

$sql = "set @a=0,@grp=0,@datediff=0,@category=0,@day='';";
mysql_query($sql);

$sql= "select category,min(day)min,max(day) max,a
from(
select category,day,a,concat(if(@a!=a,@grp:=@grp+1,@grp),if(datediff(@day,day) < -1,@datediff:=@datediff+1,@datediff)) as grp_datediff,datediff(@day,day)diff, @day:= day,@a:=a
FROM  category
order by day)as t
group by grp_datediff";

$result = mysql_query($sql);

$diff = 0;
$indx =0;
while($row = mysql_fetch_object($result)){
    if(isset($data[$indx - 1]['max'])){
    $date1 = new DateTime($data[$indx - 1]['max']);
    $date2 =  new DateTime($row->min);
    $diff = $date1->diff($date2);
    }
    if ($diff->days > 1) {

        $date = new DateTime($data[$indx-1]['max']);
        $interval = new DateInterval("P1D");
        $min = $date->add($interval);

        $date = new DateTime($data[$indx-1]['max']);
        $interval = new DateInterval("P".$diff->days."D");
        $max = $date->add($interval);

        $data[$indx]['category'] = $data[$indx-1]['category'];
        $data[$indx]['min'] = $min->format('Y-m-d');
        $data[$indx]['max'] = $max->format('Y-m-d');
        $data[$indx++]['a'] = 0;

         $data[$indx]['category'] = $row->category;
    $data[$indx]['min'] = $row->min;
    $data[$indx]['max'] = $row->max;
    $data[$indx]['a'] = $row->a;
    }else{


    $data[$indx]['category'] = $row->category;
    $data[$indx]['min'] = $row->min;
    $data[$indx]['max'] = $row->max;
    $data[$indx]['a'] = $row->a;
    }

$indx++;
}
Angelin Nadar
  • 8,944
  • 10
  • 43
  • 53
1

To make this work as you want it to, you should have two tables:

  1. for periods
  2. for days

Where each period can have many days related to it through FOREIGN KEY. With current table structure, the best you can do is to detect the continuous periods on PHP side.

tereško
  • 58,060
  • 25
  • 98
  • 150
  • 1
    unfortunately database structure cannot be modified. Thank you for suggest. The question now is: on php side, it's need to do a cycle for each days in given range? – Luca Rainone Mar 30 '12 at 15:27
0

Is this what you mean?

SELECT
    category,
    MIN(t1.day),
    MAX(t2.day),
    a
FROM
    `table` AS t1
INNER JOIN `table` AS t2 USING (category, a)
Michael
  • 11,912
  • 6
  • 49
  • 64
0

If I understand your question correctly, I would use something to the effect of:

SELECT MAX(day), MIN(day) FROM `YourTable` WHERE `category`= $cat AND `A`= $increment;

... and ...

$dateRange = $cat.","."$min"."|"."$max".",".$increment;
Thomas Wright
  • 1,309
  • 1
  • 9
  • 15