return empty rows for not existsting data

Question

Ok, i have a table with a date column and a integer column, and i want to retrieve all the rows grouped by date's day within a certain date range; since there are not rows for every day, is it possible to make mysql return rows for those days with a default value?

example

source table:

date         value
2020-01-01   1
2020-01-01   2
2020-01-03   2
2020-01-07   3
2020-01-08   4
2020-01-08   1

Standard behaviour after grouping by date and summing values:

2020-01-01   3
2020-01-03   2
2020-01-07   3
2020-01-08   5

Desired behaviour/result with empty rows:

2020-01-01   3
2020-01-02   0
2020-01-03   2
2020-01-04   0
2020-01-05   0
2020-01-06   0
2020-01-07   3
2020-01-08   5

I think this question can help you solve your issue: https://stackoverflow.com/questions/2157282/generate-days-from-date-range — Pepper, Jan 20 '20 at 15:09

LeroyFromBerlin · Accepted Answer · 2020-01-20T15:47:18.860

You can do something like the below:

# table creation:

drop table if exists test_table;

create table test_table (your_date date, your_value int(11));
insert into test_table (your_date, your_value) values ('2020-01-01', 1);
insert into test_table (your_date, your_value) values ('2020-01-01', 2);
insert into test_table (your_date, your_value) values ('2020-01-03', 2);
insert into test_table (your_date, your_value) values ('2020-01-07', 3);
insert into test_table (your_date, your_value) values ('2020-01-08', 4);
insert into test_table (your_date, your_value) values ('2020-01-08', 1);

This creates a list of basically all the dates. You then filter for the dates your interested in, join with your table and group.

You could also replace the dates in the where statement with subqueries (min and max date of your table) to make it dynamic

It's a bit of a work-around but it works.

select sbqry.base_date, sum(ifnull(t.your_value, 0))
from (select adddate('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) base_date from
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
    (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) sbqry
left join test_table t on base_date = t.your_date
where sbqry.base_date between '2020-01-01' and '2020-01-08'
group by sbqry.base_date;

input:

+------------+------------+
| your_date  | your_value |
+------------+------------+
| 2020-01-01 |          1 |
| 2020-01-01 |          2 |
| 2020-01-03 |          2 |
| 2020-01-07 |          3 |
| 2020-01-08 |          4 |
| 2020-01-08 |          1 |
+------------+------------+

output:

+------------+------------------------------+
| base_date  | sum(ifnull(t.your_value, 0)) |
+------------+------------------------------+
| 2020-01-01 |                            3 |
| 2020-01-02 |                            0 |
| 2020-01-03 |                            2 |
| 2020-01-04 |                            0 |
| 2020-01-05 |                            0 |
| 2020-01-06 |                            0 |
| 2020-01-07 |                            3 |
| 2020-01-08 |                            5 |
+------------+------------------------------+

it works but it doesn't scale well since i don't know in advance the interval, i was looking for something more procedural/automatic if such a feature exists — fudo, Jan 20 '20 at 15:57
how is it going to be determined? Possibly you can just replace the hard-coded part of the where statement by a subquery.. — LeroyFromBerlin, Jan 20 '20 at 15:59
sorry, i incorrectly read your code, it fits exactly what i need even if i'm finding it hard to understand the logic behind it, thanks — fudo, Jan 20 '20 at 16:09
@fudo The big subquery in the `FROM` part generates every single date starting from '1970-01-01' and including the next 100'000 days. — Pepper, Jan 20 '20 at 16:56

Gosfly · Answer 2 · 2020-01-21T09:09:19.487

You could also achieve what you want with the following query which may be easier to understand :

SELECT
     date_table.date,
     IFNULL(SUM(value),0) as sum_val
FROM (
     SELECT DATE_ADD('2020-01-01', INTERVAL (@i:=@i+1)-1 DAY) AS `date`
     FROM information_schema.columns,(SELECT @i:=0) gen_sub
     WHERE DATE_ADD('2020-01-01',INTERVAL @i DAY) BETWEEN '2020-01-01' AND '2020-01-08'
) date_table
LEFT JOIN test ON test.date_value = date_table.date
GROUP BY date;

FIND A DEMO HERE

You could set some variable to fix min and max dates :

SET @date_min = '2020-01-01';
SET @date_max = '2020-01-08';

SELECT DATE_ADD(@date_min, INTERVAL (@i:=@i+1)-1 DAY) AS `date`
FROM information_schema.columns, (SELECT @i:=0) gen_sub
WHERE DATE_ADD(@date_min, INTERVAL @i DAY) BETWEEN @date_min AND @date_max

Some explanation :

In fact, your question encourage us to generate a set of dates because we are looking to 'left join' 'your table' with a continuous set of date in order to match dates with no records in 'your table'.

This would be pretty easy in PostgreSQL because of generate_series function but this is not that easy in MySQL as such a useful function doesn't exist. That's why we need to be smart.

Both solutions here have the same logic behind it : I mean they are both incrementing a date value (day per day) for each row joined in another table, let's call it 'source table'. In the answer above (not mine), 'source table' is made with many unions and cross joins (it generates 100k rows), in my case here 'source table' is 'information_schema.columns' which already contains lots of rows (1800+).

In above case, initial date is fixed to 1970-01-01 and then it will increment this date 100 000 times in order to have a set of 100 000 dates beginning with 1970-01-01.

In my case, initial date is fixed to your min range date, 2020-01-01, and then it will increment this date for each row found in information_schema.columns, so around 1800 times. You will end with a set of around 1800 dates beginning with 2020-01-01.

Finally, you can left join your table with this generated set of dates (whatever the way to do it) in order to sum(value) for each day in your desired range.

Hope that would help you understand the logic behind both queries ;)

may you explain the logic of your answer? I'm trying to reproduce it to create single integer rows with value between `0` and `n`, but i'm struggling — fudo, Jan 21 '20 at 08:23
What are you trying to achieve exactly ? You only want dates with sum(value) between 0 and n ? I don't get it — Gosfly, Jan 21 '20 at 08:25
i copy-pasted your original answer since it fits exactly my need, i just tried to understand its logic making a simpler version which returns `n` rows with an incrementing value starting from `a` to `b`; i managed to do this, but i'm still wondering how it works, more expecially why `information_schema.columns` is involved — fudo, Jan 21 '20 at 08:37

return empty rows for not existsting data

example

2 Answers2

Linked