7

I have query regarding get the dates which are not exists in database table.

I have below dates in database.

2013-08-02
2013-08-02
2013-08-02
2013-08-03
2013-08-05
2013-08-08
2013-08-08
2013-08-09
2013-08-10
2013-08-13
2013-08-13
2013-08-13

and i want the result which is expected as below,

2013-08-01
2013-08-04
2013-08-06
2013-08-07
2013-08-11
2013-08-12

as you can see result has six dates which are not present into database,

i have tried below query

SELECT
    DISTINCT DATE(w1.start_date) + INTERVAL 1 DAY AS missing_date
FROM
    working w1
LEFT JOIN
    (SELECT DISTINCT start_date FROM working ) w2 ON DATE(w1.start_date) = DATE(w2.start_date) - INTERVAL 1 DAY
WHERE
    w1.start_date BETWEEN '2013-08-01' AND '2013-08-13'
AND
    w2.start_date IS NULL;

but above return following result.

2013-08-04
2013-08-14
2013-08-11
2013-08-06

as you can see its giving me back four dates from that 14 is not needed but its still not contain 3 dates its because of left join.

Now please look into my query and let me know what are the best way i can do this?

Thanks for looking and giving time.

Charles
  • 50,943
  • 13
  • 104
  • 142
Dipesh Parmar
  • 27,090
  • 8
  • 61
  • 90
  • "as you can see result has six dates which are not present into database". This makes me laugh. I see millions of dates not in the data (at least since humans recognized what a date is). How do you know what the beginning and end values are? – Gordon Linoff Aug 13 '13 at 11:24
  • By the nature of relational algebra, it is impossible I think to let the DB infer *unspecified* number of missing data. The best thing you could do is probably to find the non-empty date *range* between two dates already present in your table. – Sylvain Leroux Aug 13 '13 at 11:26
  • 3
    @GordonLinoff i think you forget to see query...i already set `w1.start_date BETWEEN '2013-08-01' AND '2013-08-13'` so from that i can say six dates are missing... – Dipesh Parmar Aug 13 '13 at 11:26
  • you may consider comparing to 'ddd' based on either a list of numbers of rownum of all_object – Randy Aug 13 '13 at 17:40

6 Answers6

18

I guess you could always generate the date sequence and just use a NOT IN to eliminate the dates that actually exist. This will max out at a 1024 day range, but is easy to shrink or extend, the date column is called "mydate" and is in the table "table1";

SELECT * FROM (
  SELECT DATE_ADD('2013-08-01', INTERVAL t4+t16+t64+t256+t1024 DAY) day 
  FROM 
   (SELECT 0 t4    UNION ALL SELECT 1   UNION ALL SELECT 2   UNION ALL SELECT 3  ) t4,
   (SELECT 0 t16   UNION ALL SELECT 4   UNION ALL SELECT 8   UNION ALL SELECT 12 ) t16,   
   (SELECT 0 t64   UNION ALL SELECT 16  UNION ALL SELECT 32  UNION ALL SELECT 48 ) t64,      
   (SELECT 0 t256  UNION ALL SELECT 64  UNION ALL SELECT 128 UNION ALL SELECT 192) t256,     
   (SELECT 0 t1024 UNION ALL SELECT 256 UNION ALL SELECT 512 UNION ALL SELECT 768) t1024     
  ) b 
WHERE day NOT IN (SELECT mydate FROM Table1) AND day<'2013-08-13';

From the "I would add an SQLfiddle if it wasn't down" dept.

Thanks for help here is the query i am end up with and its working

SELECT * FROM
(
    SELECT DATE_ADD('2013-08-01', INTERVAL t4+t16+t64+t256+t1024 DAY) missingDates 
        FROM 
    (SELECT 0 t4    UNION ALL SELECT 1   UNION ALL SELECT 2   UNION ALL SELECT 3  ) t4,
    (SELECT 0 t16   UNION ALL SELECT 4   UNION ALL SELECT 8   UNION ALL SELECT 12 ) t16,   
    (SELECT 0 t64   UNION ALL SELECT 16  UNION ALL SELECT 32  UNION ALL SELECT 48 ) t64,      
    (SELECT 0 t256  UNION ALL SELECT 64  UNION ALL SELECT 128 UNION ALL SELECT 192) t256,     
    (SELECT 0 t1024 UNION ALL SELECT 256 UNION ALL SELECT 512 UNION ALL SELECT 768) t1024     
) b 
WHERE
    missingDates NOT IN (SELECT DATE_FORMAT(start_date,'%Y-%m-%d')
            FROM
                working GROUP BY start_date)
    AND
    missingDates < '2013-08-13';
Dipesh Parmar
  • 27,090
  • 8
  • 61
  • 90
Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
  • Thanks a lot for hint and code....I used your query and modified some parameters and solved the issues. I am adding my query so it will be helpful to other. – Dipesh Parmar Aug 14 '13 at 05:22
  • well i have one question though...how can i add where condition for checking individual user..? – Dipesh Parmar Aug 14 '13 at 11:19
  • @DipeshParmar I think you'll need to add info how the data is connected to users somehow, in my version of the query I suspect you'll only want to change the subquery to `SELECT start_date FROM working WHERE user=...`, while I'm not sure what the GROUP BY you're using is supposed to do. – Joachim Isaksson Aug 14 '13 at 12:46
  • Interested in how would that look for smaller scope - missing hours and for missing minutes. – Dr. House Jan 29 '20 at 22:46
3

My bet would be probably to create a dedicated Calendar table just to be able to use it on a LEFT JOIN.

You could create the table on per need basis, but as it will not represent a such large amount of data, the simplest and probably most efficient approach is to create it once for all, as I do below using a stored procedure:

--
-- Create a dedicated "Calendar" table
--
CREATE TABLE Calendar (day DATE PRIMARY KEY);

DELIMITER //
CREATE PROCEDURE init_calendar(IN pStart DATE, IN pEnd DATE)
BEGIN
    SET @theDate := pStart;
    REPEAT
        -- Here I use *IGNORE* in order to be able
        -- to call init_calendar again for extend the
        -- "calendar range" without to bother with
        -- "overlapping" dates
        INSERT IGNORE INTO Calendar VALUES (@theDate);
        SET @theDate := @theDate + INTERVAL 1 DAY;
    UNTIL @theDate > pEnd END REPEAT;
END; //
DELIMITER ;

CALL init_calendar('2010-01-01','2015-12-31');

In this example, the Calendar hold 2191 consecutive days, which represent at a roughly estimate less that 15KB. And storing all the dates from the 21th century will represent less that 300KB...

Now, this is your actual data table as described in the question:

--
-- *Your* actual data table
--
CREATE TABLE tbl (theDate DATE);
INSERT INTO tbl VALUES 
    ('2013-08-02'),
    ('2013-08-02'),
    ('2013-08-02'),
    ('2013-08-03'),
    ('2013-08-05'),
    ('2013-08-08'),
    ('2013-08-08'),
    ('2013-08-09'),
    ('2013-08-10'),
    ('2013-08-13'),
    ('2013-08-13'),
    ('2013-08-13');

And finally the query:

--
-- Now the query to find date not "in range"
--

SET @start = '2013-08-01';
SET @end = '2013-08-13';

SELECT Calendar.day FROM Calendar LEFT JOIN tbl
    ON Calendar.day = tbl.theDate
    WHERE Calendar.day BETWEEN @start AND @end
    AND tbl.theDate IS NULL;

Producing:

+------------+
| day        |
+------------+
| 2013-08-01 |
| 2013-08-04 |
| 2013-08-06 |
| 2013-08-07 |
| 2013-08-11 |
| 2013-08-12 |
+------------+
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
2

This is how i would do it:

$db_dates = array (
'2013-08-02',
'2013-08-03',
'2013-08-05',
'2013-08-08',
'2013-08-09',
'2013-08-10',
'2013-08-13'
);
$missing = array();
$month = "08";
$year = "2013";
$day_start = 1;
$day_end = 14
for ($i=$day_start; $i<$day_end; $i++) {
    $day = $i;
    if ($i<10) {
        $day = "0".$i;  
    }
    $check_date = $year."-".$month."-".$day;
    if (!in_array($check_date, $db_dates)) {
        array_push($missing, $check_date);  
    }
}
print_r($missing);

I made it just to that interval but you can just define another interval or make it work for the whole year.

Tiago
  • 383
  • 1
  • 7
1

I'm adding this to the excellent answer by Dipesh if anybody wants more than 1024 days (or hours). I generated below 279936 hours from 2015 to 2046:

    SELECT 
DATE_ADD('2015-01-01', INTERVAL 
POWER(6,6)*t6 + POWER(6,5)*t5 + POWER(6,4)*t4 + POWER(6,3)*t3 + POWER(6,2)*t2 + 
POWER(6,1)*t1 + t0 
HOUR) AS period
FROM
 (SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t0,
 (SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t1,
 (SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t2,
 (SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t3,
 (SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t4,
 (SELECT 0 t5 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t5,
 (SELECT 0 t6 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t6
 ORDER BY period

just plug this into the answer query.

user1693885
  • 493
  • 4
  • 5
0

The way I would solve in this in a datawarehouse-type situation is to populate a "static" table with dates over an appropriate period (there are example scripts for this type of thing which are easy to google) and then left outer join or right outer join your table to it: rows where there are no matches are the missing dates.

davek
  • 22,499
  • 9
  • 75
  • 95
0
DECLARE @date date;
declare @dt_cnt int = 0;
set @date='2014-11-1';
while @date < '2014-12-31'
begin
  select @dt_cnt = COUNT(att_id) from date_table where att_date=@date ;

      if(@dt_cnt = 0) 
      BEGIN
         print @date
      END
      set @date = DATEADD(day,1,@date);
end
jithin john
  • 552
  • 4
  • 12