6

I've got an array of dates that can contain multiple date ranges in it.

dates = [
 '2020-01-01',
 '2020-01-02',
 '2020-01-03',
 '2020-01-06',
 '2020-01-07',
 '2020-01-08'
]

In this example, the list contains 2 separate consecutive date ranges (2020-01-01 to 2020-01-03 & 2020-01-06 to 2020-01-08)

I'm attempting to figure out how I would loop through this list and find all the consecutive date ranges.

One of the articles I'm looking at (How to detect if dates are consecutive in Python?) seems to have a good approach, however, I'm struggling to implement this logic in my use case.

pyFiddler
  • 301
  • 3
  • 11

5 Answers5

6

More itertools has a function called consecutive_groups that does this for you:

Or you can view the source code and copy it's approach:

from datetime import datetime
from itertools import groupby
from operator import itemgetter

def consecutive_groups(iterable, ordering=lambda x: x):
    for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
        yield map(itemgetter(1), g)

Then to use the function:

for g in consecutive_groups(dates, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):
    print(list(g))

Or (more appropriately) using a function instead of lambda:

def to_date(date):
    return datetime.strptime(date, '%Y-%m-%d').toordinal()

for g in consecutive_groups(dates, to_date):
    print(list(g))

['2020-01-01', '2020-01-02', '2020-01-03']
['2020-01-06', '2020-01-07', '2020-01-08']
Jab
  • 26,853
  • 21
  • 75
  • 114
  • 2
    Excellent solution. Maybe you could explain the key concept on which it is based, since it looks a bit cryptic: i.e. enumerate provides a continuous sequence. The difference of each date with it remains constant until there is a gap. That change triggers the grouping. Simple and efficient, very elegant – Pynchia Jan 17 '20 at 05:25
  • 1
    Much cleaner than my solution, more readable. And learned about more-itertools :) thank you @jab – pyFiddler Jan 17 '20 at 17:21
1

This assumes that single-date "ranges" are still represented by 2 dates:

def makedate(s):
    return datetime.strptime( s, "%Y-%m-%d" )
def splitIntoRanges( dates ):
    ranges = []
    start_s = last_s = dates[0]
    last = makedate(start_s)
    for curr_s in dates[1:]:
        curr = makedate(curr_s)
        if (curr - last).days > 1:
            ranges.append((start_s,last_s))
            start_s = curr_s
        last_s = curr_s
        last = curr
    return ranges + [(start_s,last_s)]
Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
0

I took a similar, though definitely not quite as elegant approach as @Scott:

ranges = []

dates = [datetime.strptime(date, '%Y-%m-%d') for date in dates]
start = dates[0]

for i in range(1, len(dates)):
    if (dates[i] - dates[i-1]).days == 1 and i==len(dates)-1:
        end = dates[i]
        ranges.append(f'{start} to {end}')
        start = dates[i]
    elif (dates[i] - dates[i - 1]).days > 1:
        end = dates[i - 1]
        ranges.append(f'{start} to {end}')
        start = dates[i]
    else:
        continue
whege
  • 1,391
  • 1
  • 5
  • 13
0

I found the key to my solution in a second post and pieced it together.

There are two parts to my issue:

  1. How do I represent a list of dates in an effective manner

Answer: https://stackoverflow.com/a/9589929/2150673

pto = [
    '2020-01-03',
    '2020-01-08',
    '2020-01-02',
    '2020-01-07',
    '2020-01-01',
    '2020-01-06'
]

ordinal_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto]
  1. Once you have a list of dates in integer representation, you can simply look for consecutive integers and get the upper and lower bounds of each range, and then convert back to yyyy-mm-dd format.

Answer: https://stackoverflow.com/a/48106843

def ranges(nums):
    nums = sorted(set(nums))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    return list(zip(edges, edges))

My complete function:

def get_date_ranges(pto_list: list) -> list:
    pto_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto_list]
    nums = sorted(set(pto_dates))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s + 1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    ordinal_ranges = list(zip(edges, edges))
    date_bounds = []
    for start, end in ordinal_ranges:
        date_bounds.append((
            datetime.datetime.fromordinal(start).strftime('%Y-%m-%d'),
            datetime.datetime.fromordinal(end).strftime('%Y-%m-%d')
        ))
    return date_bounds
pyFiddler
  • 301
  • 3
  • 11
  • While this does answer my question I am interested in finding optimization for this. Ashamed to say, I don't fully understand every part of this function and need to spend some time with these basic python functions. – pyFiddler Jan 16 '20 at 17:53
  • @Jab 's solution is the one and only worth picking. Elegant and efficient – Pynchia Jan 17 '20 at 05:19
0

You can find all the consecutive date ranges and append them to a list of list and access your ranges based on the index but I prefer using keys within a dictionary for readability.

Here is how: (note: please read comments)

dates = [datetime.strptime(d, "%Y-%m-%d") for d in dates] # new datetime parsed from a string
date_ints = [d.toordinal() for d in dates]  # toordinal() returns the day count from the date 01/01/01 in integers
ranges = {}; arange = []; prev=0; index=0; j=1
for i in date_ints: # iterate through date integers
    if i+1 == date_ints[index] + 1 and i - 1 == prev: # check and compare if integers are in sequence
        arange.append(dates[index].strftime("%Y-%m-%d"))
    elif prev == 0: # append first date to 'arange' list since 'prev' has not been updated
        arange.append(dates[index].strftime("%Y-%m-%d"))
    else:
        ranges.update({f'Range{j}': tuple(arange)}) # integer are no longer in sequence, update dictionary with new range  
        arange = []; j += 1                                   # clear 'arange' and start appending to new range  
        arange.append(dates[index].strftime("%Y-%m-%d"))
    index += 1; prev = i
ranges.update({f'Range{j}': tuple(arange)})
print(ranges)  
print(ranges['Range1'])  # access a range based on the associated key
print(ranges['Range2']) 

outputs:

{'Range1': ('2020-01-01', '2020-01-02', '2020-01-03'), 'Range2': ('2020-01-06', '2020-01-07', '2020-01-08')}
('2020-01-01', '2020-01-02', '2020-01-03')
('2020-01-06', '2020-01-07', '2020-01-08')