2

Ok, i have a table with a date column and a integer column, and i want to retrieve all the rows grouped by date's day within a certain date range; since there are not rows for every day, is it possible to make mysql return rows for those days with a default value?

example

source table:

date         value
2020-01-01   1
2020-01-01   2
2020-01-03   2
2020-01-07   3
2020-01-08   4
2020-01-08   1

Standard behaviour after grouping by date and summing values:

2020-01-01   3
2020-01-03   2
2020-01-07   3
2020-01-08   5

Desired behaviour/result with empty rows:

2020-01-01   3
2020-01-02   0
2020-01-03   2
2020-01-04   0
2020-01-05   0
2020-01-06   0
2020-01-07   3
2020-01-08   5
fudo
  • 2,254
  • 4
  • 22
  • 44
  • I think this question can help you solve your issue: https://stackoverflow.com/questions/2157282/generate-days-from-date-range – Pepper Jan 20 '20 at 15:09

2 Answers2

2

You can do something like the below:

# table creation:

drop table if exists test_table;

create table test_table (your_date date, your_value int(11));
insert into test_table (your_date, your_value) values ('2020-01-01', 1);
insert into test_table (your_date, your_value) values ('2020-01-01', 2);
insert into test_table (your_date, your_value) values ('2020-01-03', 2);
insert into test_table (your_date, your_value) values ('2020-01-07', 3);
insert into test_table (your_date, your_value) values ('2020-01-08', 4);
insert into test_table (your_date, your_value) values ('2020-01-08', 1);

This creates a list of basically all the dates. You then filter for the dates your interested in, join with your table and group.

You could also replace the dates in the where statement with subqueries (min and max date of your table) to make it dynamic

It's a bit of a work-around but it works.

select sbqry.base_date, sum(ifnull(t.your_value, 0))
from (select adddate('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) base_date from
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) sbqry
left join test_table t on base_date = t.your_date
where sbqry.base_date between '2020-01-01' and '2020-01-08'
group by sbqry.base_date;

input:

+------------+------------+
| your_date  | your_value |
+------------+------------+
| 2020-01-01 |          1 |
| 2020-01-01 |          2 |
| 2020-01-03 |          2 |
| 2020-01-07 |          3 |
| 2020-01-08 |          4 |
| 2020-01-08 |          1 |
+------------+------------+

output:

+------------+------------------------------+
| base_date  | sum(ifnull(t.your_value, 0)) |
+------------+------------------------------+
| 2020-01-01 |                            3 |
| 2020-01-02 |                            0 |
| 2020-01-03 |                            2 |
| 2020-01-04 |                            0 |
| 2020-01-05 |                            0 |
| 2020-01-06 |                            0 |
| 2020-01-07 |                            3 |
| 2020-01-08 |                            5 |
+------------+------------------------------+
LeroyFromBerlin
  • 395
  • 3
  • 12
  • it works but it doesn't scale well since i don't know in advance the interval, i was looking for something more procedural/automatic if such a feature exists – fudo Jan 20 '20 at 15:57
  • how is it going to be determined? Possibly you can just replace the hard-coded part of the where statement by a subquery.. – LeroyFromBerlin Jan 20 '20 at 15:59
  • sorry, i incorrectly read your code, it fits exactly what i need even if i'm finding it hard to understand the logic behind it, thanks – fudo Jan 20 '20 at 16:09
  • @fudo The big subquery in the `FROM` part generates every single date starting from '1970-01-01' and including the next 100'000 days. – Pepper Jan 20 '20 at 16:56
2

You could also achieve what you want with the following query which may be easier to understand :

SELECT
     date_table.date,
     IFNULL(SUM(value),0) as sum_val
FROM (
     SELECT DATE_ADD('2020-01-01', INTERVAL (@i:=@i+1)-1 DAY) AS `date`
     FROM information_schema.columns,(SELECT @i:=0) gen_sub
     WHERE DATE_ADD('2020-01-01',INTERVAL @i DAY) BETWEEN '2020-01-01' AND '2020-01-08'
) date_table
LEFT JOIN test ON test.date_value = date_table.date
GROUP BY date;

FIND A DEMO HERE

You could set some variable to fix min and max dates :

SET @date_min = '2020-01-01';
SET @date_max = '2020-01-08';

SELECT DATE_ADD(@date_min, INTERVAL (@i:=@i+1)-1 DAY) AS `date`
FROM information_schema.columns, (SELECT @i:=0) gen_sub
WHERE DATE_ADD(@date_min, INTERVAL @i DAY) BETWEEN @date_min AND @date_max

Some explanation :

In fact, your question encourage us to generate a set of dates because we are looking to 'left join' 'your table' with a continuous set of date in order to match dates with no records in 'your table'.

This would be pretty easy in PostgreSQL because of generate_series function but this is not that easy in MySQL as such a useful function doesn't exist. That's why we need to be smart.

Both solutions here have the same logic behind it : I mean they are both incrementing a date value (day per day) for each row joined in another table, let's call it 'source table'. In the answer above (not mine), 'source table' is made with many unions and cross joins (it generates 100k rows), in my case here 'source table' is 'information_schema.columns' which already contains lots of rows (1800+).

In above case, initial date is fixed to 1970-01-01 and then it will increment this date 100 000 times in order to have a set of 100 000 dates beginning with 1970-01-01.

In my case, initial date is fixed to your min range date, 2020-01-01, and then it will increment this date for each row found in information_schema.columns, so around 1800 times. You will end with a set of around 1800 dates beginning with 2020-01-01.

Finally, you can left join your table with this generated set of dates (whatever the way to do it) in order to sum(value) for each day in your desired range.

Hope that would help you understand the logic behind both queries ;)

Gosfly
  • 1,240
  • 1
  • 8
  • 14
  • yeah, this is a more procedural approach – fudo Jan 20 '20 at 16:33
  • may you explain the logic of your answer? I'm trying to reproduce it to create single integer rows with value between `0` and `n`, but i'm struggling – fudo Jan 21 '20 at 08:23
  • What are you trying to achieve exactly ? You only want dates with sum(value) between 0 and n ? I don't get it – Gosfly Jan 21 '20 at 08:25
  • i copy-pasted your original answer since it fits exactly my need, i just tried to understand its logic making a simpler version which returns `n` rows with an incrementing value starting from `a` to `b`; i managed to do this, but i'm still wondering how it works, more expecially why `information_schema.columns` is involved – fudo Jan 21 '20 at 08:37
  • 1
    You can now find my explanation above ! – Gosfly Jan 21 '20 at 09:09