How to query date range date by date as rows

Question

My table presences holdes date ranges (start and end) together with an employee and location id, so that I am able to store an employee's presence at a location in a specific date range:

CREATE TABLE `presences` (
  id int(11) auto_increment not null,
  start timestamp null default null,
  end timestamp null default null,
  employee int(11) null default null,
  location int(11) null default null,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Example data: */
INSERT INTO `presences` (start, end, employee, location) VALUES
("2022-04-05 00:00:00", "2022-04-05 00:00:00", 1, 1),
("2022-04-06 00:00:00", "2022-04-07 00:00:00", 1, 1),
("2022-04-07 00:00:00", "2022-04-10 00:00:00", 1, 1),
("2022-01-01 00:00:00", "2022-12-31 00:00:00", 1, 1);

In my app, I am querying the table with a given date range and I receive rows where at least one day of the tables date range fits into the provided date range:

SELECT *
FROM `presences`
WHERE DATE(start) <= '2022-04-06' AND DATE(end) >= '2022-04-04'

Is there a way to split the resulting rows date range? E.g. the provide range is 2022-04-04 to 2022-04-06 and a resulting row has a date range 2022-01-01 to 2022-12-31. The range of the result that fits in provide date range is 2022-04-04 to 2022-04-06, thus I want to split that range in a way, that there is not one resulting row, but one row per date in range like:

id | employee | location |    date
--------------------------------------
 4 |     1    |     1    | 2022-04-04 
 4 |     1    |     1    | 2022-04-05
 4 |     1    |     1    | 2022-04-06

If the provided range is 2022-04-01 to 2022-04-06 and my resulting row is 2022-04-05 to 2022-04-06, I want to see

id | employee | location |    date
--------------------------------------
 X |     Y    |     Z    | 2022-04-05
 X |     Y    |     Z    | 2022-04-06

Fiddle: http://sqlfiddle.com/#!9/0d4e64/1

Note that adding a link to a live example *in addition* to having code in the question is welcomed, but all relevant code should at least be posted in the question itself. For one thing, external sites may be down (as is the case as I post this comment). — outis, Mar 20 '22 at 10:24
Questions should include both desired results and actual results (if any). Please read "[ask]" for more. — outis, Mar 20 '22 at 10:28
sample of original data and expected results would help! Your fiddler has two date columns (start, end)... the sample you show above has one column. You need to edit and revise to make it more clearer. — nazim, Mar 20 '22 at 10:38
Wow my question woke up the entire stack overflow police lol Anyhow thank you for your feedback. I rephrased the entire question and hope that depicts more clearly where I need help and makes answering much more efficient. — PeterPan, Mar 20 '22 at 11:35
@PeterPan - Heh, always nice to see someone being a good sport about folks trying to help (and actually providing the requested info!) :-) To answer your question, assuming you're using MySQL 8.x, you could use a CTE, but creating a permanent calendar table is a better way to go here. Tbh, there's tons of examples of this on S.O. you can adapt, but here's a quick demo using your tables https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=5587487563e9f598a7084b75325b03ee — SOS, Mar 20 '22 at 19:30
... it's probably not exactly what you want, but gives you something to work with — SOS, Mar 20 '22 at 19:56
Note that there are two questions here: how to generating dates and how to combine those dates with the results. The former is already asked & answered: "[Generate days from date range sql](https://stackoverflow.com/q/510012/90527)", "[generate days from date range](https://stackoverflow.com/q/2157282/90527)". This question should be narrowed in focus. — outis, Mar 20 '22 at 22:36
Thank you. I don't get the point of that calendar table. As I understand, this is a table that holds almost any date. Is this just to have all date related information like `isWeekend`, `isHolidy`, `dayName` and stuff? And... How do I keep this calendar up-to-date? Do I add the next three years for example and with an event scheduler all the remaining days with the time going by? — PeterPan, Apr 03 '22 at 19:32

outis · Accepted Answer · 2022-03-20T23:36:32.177

With a suitable source of dates (as given by answers to "Get a list of dates between two dates" or generate days from date range), the dates can be filtered based on the query date range (e.g. '2022-04-04' and '2022-04-06') and joined to the rows in the results. Each date can be joined to each row; to limit these to dates in the interval (start, end) for each result row from presences, simply join between the start & end timestamps:

SET @after := '2022-04-04',
    @before := '2022-04-06';

SELECT *
  FROM `presences` AS p
    JOIN (SELECT * FROM `days` WHERE `day` BETWEEN @after AND @before)

      AS d ON d.`day` BETWEEN Date(p.`start`) AND Date(p.`end`) -- The core of the solution

  WHERE Date(start) <= @before AND Date(end) >= @after
  ORDER BY id, d.`day`
  ;

The above uses a sub-select to filter the days. This can also be done by filtering in the join, or the WHERE clause (as can also be seen in a fiddle):

SELECT p.id, p.employee, p.location, d.day
  FROM `presences` AS p
    JOIN `days` AS d ON d.`day` BETWEEN Date(p.`start`) AND Date(p.`end`)
  WHERE Date(`start`) <= @before AND Date(`end`) >= @after
    AND d.day BETWEEN @after AND @before
  ORDER BY id, d.`day`
  ;
  
  
SELECT p.id, p.employee, p.location, d.day
  FROM `presences` AS p
    JOIN `days` AS d 
      ON    d.`day` BETWEEN @after AND @before 
        AND d.`day` BETWEEN Date(p.`start`) AND Date(p.`end`)
  WHERE Date(`start`) <= @before AND Date(`end`) >= @after
  ORDER BY id, d.`day`
  ;

Which to use is a matter of readability and potentially efficiency. For the example queries above, an EXPLAIN of each shows the same execution plan, so there shouldn't be any efficiency differences:

+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | p     | NULL       | ALL   | NULL          | NULL | NULL    | NULL |    4 |      100 | Using where; Using temporary; Using filesort                    |
|  1 | SIMPLE      | d     | NULL       | range | day           | day  | 4       | NULL |    4 |      100 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------------------------------------------+

A Note on Column Types

If the start and end columns will actually only ever hold dates (i.e. the time portion will always be 00:00:00), they can be redefined as the DATE type, and the calls to the Date() function can be removed from the query, simplifying it. You could get away with not redefining the columns and merely removing the calls to Date(), as the various date-time columns are comparable as-is, though this may cause problems should any of the TIMESTAMP columns have a time other than 00:00:00.

The primary advantage of removing functions is to allow indices (here, on start and end) to be used. The example schema didn't index start and end, so there wouldn't be any difference in execution time or plans. If, however, you add an index:

CREATE INDEX presence_interval ON presences (`start`, `end`);

then the execution plans for the queries with the calls to Date() removed are all:

+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key               | key_len | ref  | rows | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | p     | NULL       | range | presence_interval | presence_interval | 5       | NULL |    2 |    33.33 | Using index condition; Using temporary; Using filesort          |
|  1 | SIMPLE      | d     | NULL       | range | day               | day               | 4       | NULL |    4 |      100 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------+----------+-----------------------------------------------------------------+

Note there are fewer rows from the presences column, and the filtered percentage is cut by 2/3.

How to query date range date by date as rows

1 Answers1

A Note on Column Types