23

In Python how do I find all the missing days in a sorted list of dates?

Vishal
  • 19,879
  • 23
  • 80
  • 93

10 Answers10

32

using sets

>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23), date(2010, 2, 24), date(2010, 2, 25),
         date(2010, 2, 26), date(2010, 3, 1), date(2010, 3, 2)]
>>> date_set = set(d[0] + timedelta(x) for x in range((d[-1] - d[0]).days))
>>> missing = sorted(date_set - set(d))
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
>>> 
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
4

Sort the list of dates and iterate over it, remembering the previous entry. If the difference between the previous and current entry is more than one day, you have missing days.

Here's one way to implement it:

from datetime import date, timedelta
from itertools import tee, izip

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    b.next()
    return izip(a, b)

def missing_dates(dates):
    for prev, curr in pairwise(sorted(dates)):
        i = prev
        while i + timedelta(1) < curr:
            i += timedelta(1)
            yield i

dates = [ date(2010, 1, 8),
          date(2010, 1, 2),
          date(2010, 1, 5),
          date(2010, 1, 1),
          date(2010, 1, 7) ]

for missing in missing_dates(dates):
    print missing

Output:

2010-01-03
2010-01-04
2010-01-06

Performance is O(n*log(n)) where n is the number of days in the span when the input is unsorted. As your list is already sorted, it will run in O(n).

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
2
>>> from datetime import datetime, timedelta
>>> date_list = [datetime(2010, 2, 23),datetime(2010, 2, 24),datetime(2010, 2, 25),datetime(2010, 2, 26),datetime(2010, 3, 1),datetime(2010, 3, 2)]
>>> 
>>> date_set=set(date_list)         # for faster membership tests than list
>>> one_day = timedelta(days=1)
>>> 
>>> test_date = date_list[0]
>>> missing_dates=[]
>>> while test_date < date_list[-1]:
...     if test_date not in date_set:
...         missing_dates.append(test_date)
...     test_date += one_day
... 
>>> print missing_dates
[datetime.datetime(2010, 2, 27, 0, 0), datetime.datetime(2010, 2, 28, 0, 0)]

This also works for datetime.date objects, but the OP says the list is datetime.datetime objects

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
2

USING A FOR LOOP

The imports you'll need:

import datetime
from datetime import date, timedelta

Let's say you have a sorted list called dates with several missing dates in it.

First select the first and last date:

start_date = dates[0]
end_date = dates[len(dates)-1]

Than count the number of days between these two dates:

numdays = (end_date - start_date).days

Than create a new list with all dates between start_date and end_date:

all_dates = []
for x in range (0, (numdays+1)):
all_dates.append(start_date + datetime.timedelta(days = x))

Than check with dates are in all_dates but not in dates by using a for loop with range and adding these dates to dates_missing:

dates_missing = []

for i in range (0, len(all_dates)):
   if (all_dates[i] not in dates):
       dates_missing.append(all_dates[i])
   else:
       pass

Now you'll have a list called dates_missing with all the missing dates.

Nelis
  • 91
  • 1
  • 11
1

Put the dates in a set and then iterate from the first date to the last using datetime.timedelta(), checking for containment in the set each time.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
1

Here's an example for a pandas dataframe with a date column. If it's an index, then change df.Date to df.index.

import pandas as pd

df.Date = pd.to_datetime(df.Date) # ensure datetime format of date column
min_dt = df.Date.min() # get lowest date
max_dt = df.Date.max() # get highest date

dt_range = pd.date_range(min_dt, max_dt) # get all requisite dates in range
missing_dts = [d for d in dt_range if d not in df.Date] # list missing
print("There are {n} missing dates".format(n=len(missing_dts)))
Brndn
  • 676
  • 1
  • 7
  • 21
0
import datetime

DAY = datetime.timedelta(days=1)
# missing dates: a list of [start_date, end)
missing = [(d1+DAY, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1) > DAY]

def date_range(start_date, end, step=DAY):
    d = start_date
    while d < end:
        yield d
        d += step

missing_dates = [d for d1, d2 in missing for d in date_range(d1, d2)]
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

Using a list comprehension

>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23),date(2010, 2, 24),date(2010, 2, 25),date(2010, 2, 26),date(2010, 3, 1),date(2010, 3, 2)]
>>> date_set=set(d)
>>> missing = [x for x in (d[0]+timedelta(x) for x in range((d[-1]-d[0]).days)) if x not in date_set]

>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
0

A good way of getting this done in Python is as follows. You need not worry about efficiency unless you have dates from multiple years in your list and this code always needs to run as per user interaction and yield output immediately.

  1. Get missing dates from one list (sorted or not)

Create a function that gives you all the dates from start_date to end_date. And use it.

import datetime

def get_dates(start_date, end_date):
    span_between_dates = range(end_date - start_date).days
    for index in span_between_dates + 1:
        # +1 is to make start and end dates inclusive.
        yield start_date + datetime.timedelta(index)

my_date_list = ['2017-03-05', '2017-03_07', ...]
# Edit my_date_list as per your requirement.
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
    if date not in my_date_list:
        print date
  1. Get missing or overlapping dates between two date ranges.

get_dates function should be defined.

my_other_date_list = []  # your other date range
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
    if (date in my_date_range) and (date in my_other_date_list):
        print ('overlapping dates between 2 lists:')
        print date
    elif (date in my_date_range) and (date not in my_other_date_list):
        print ('missing dates:')
        print date
Bishwas Mishra
  • 1,235
  • 1
  • 12
  • 25
0

Assuming you have imported the pandas library:

incomplete_dates = pd.DataFrame({'dates': ['2021-01-10', '2021-01-14', '2021-01-15','2021-01-20']})

incomplete_dates = pd.to_datetime(incomplete_dates['dates'])

complete_dates = pd.date_range(start="2021-01-10", end="2021-01-20", freq='D')
 
missing_dates = complete_dates.difference(incomplete_dates)

print (missing_dates)

>>> DatetimeIndex(['2021-01-11', '2021-01-12', '2021-01-13', '2021-01-16',
               '2021-01-17', '2021-01-18', '2021-01-19'],
              dtype='datetime64[ns]', freq=None)
  1. original dataset with missing dates

  2. convert string dates into datetime format

  3. complete range of time period

  4. function for finding missing dates

learn more from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.difference.html

Henul
  • 192
  • 1
  • 10