I was trying to generate a report for a given date range from the following table.
table_columns => employee_id |date | status
where status 1 = not_visited, 2 = visited, 3 = canceled, 4 = pending (for approval) The report should look like the following:
+-------------+------------+-------+-------------+---------+----------+---------+
| employee_id | date | total | not_visited | visited | canceled | pending |
+-------------+------------+-------+-------------+---------+----------+---------+
| 3 | 2021-06-01 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-02 | 22 | 10 | 2 | 10 | 0 |
| 3 | 2021-06-03 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-05 | 11 | 10 | 1 | 0 | 0 |
| 4 | 2021-06-01 | 11 | 8 | 3 | 0 | 0 |
| 5 | 2021-06-01 | 10 | 1 | 9 | 0 | 0 |
+-------------+------------+-------+-------------+---------+----------+---------+
The query for this report is:
select va.employee_id, va.date,
count(*) as total,
sum(case when status = 1 then 1 else 0 end) as not_visited,
sum(case when status = 2 then 1 else 0 end) as visited,
sum(case when status = 3 then 1 else 0 end) as canceled,
sum(case when status = 4 then 1 else 0 end) as pending
from visiting_addresses va
where va.date >= '2021-06-01'
and va.date <= '2021-06-30'
group by va.employee_id, va.date;
If you look at the result, there is no entry for date 2021-06-04
for employee_id = 3. Also there is no data from 2021-06-06 to 2021-06-30 . I will have to include this dates on the result. So I tried to create another query that will generate dates between the given range. The following query will do that
SELECT gen_date
FROM
(SELECT v.gen_date
FROM
(SELECT ADDDATE('1970-01-01',t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) gen_date
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t4
) v
WHERE v.gen_date BETWEEN '2021-06-01' AND '2021-06-30'
) calendar;
This query will generate dates like the following:
+------------+
| gen_date |
+------------+
| 2021-06-01 |
| 2021-06-02 |
| 2021-06-03 |
| .......... |
| ...........|
| 2021-06-27 |
| 2021-06-28 |
| 2021-06-29 |
| 2021-06-30 |
+------------+
Now The question is, how do I join this above two queries in a way so that for each employee_id, all dates are present in the result? Or Is it even possible in this way? (The actual table contains 5 million rows. employee_id column has a cardinality of 3k++, date and employee_id columns are indexed)