Identify groups of continuous time intervals in a list

Question

I have this list of date:

dates = ['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2020-01', '2020-03', '2020-04', '2020-05', '2020-09', '2020-10']

What I want is to be able to detect that :

['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12'] is a continuous time interval

['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11'] is another one

then there is

['2020-01'],

['2020-03', '2020-04', '2020-05']

and ['2020-09', '2020-10']

I would appreciate your help. Thanks

Please provide an example of the output, one of what you've tried and explain what's not working with it. — Andre.IDK, Nov 29 '20 at 17:00

Dani Mesejo · Answer 1 · 2020-11-29T15:36:46.547

You could use itertools.groupby:

from itertools import groupby, count

dates = ['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2019-01', '2019-02', '2019-03', '2019-04',
         '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2020-01', '2020-03', '2020-04',
         '2020-05', '2020-09', '2020-10']
counter = count(0)

res = [list(group) for _, group in groupby(dates, key=lambda x: int(x.replace('-', '')) - next(counter))]

for g in res:
    print(g)

Output

['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12']
['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11']
['2020-01']
['2020-03', '2020-04', '2020-05']
['2020-09', '2020-10']

The above code listing is an adaptation of an old recipe for finding runs of consecutive numbers, see the examples here

The main idea is to group the input by the key function, to better understand what it's happening let's apply the key function to the values:

res = list(map(lambda x: int(x.replace('-', '')) - next(counter), dates))
print(res)

Output (mapping key to the elements of dates)

[201707, 201707, 201707, 201707, 201707, 201707, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201984, 201985, 201985, 201985, 201988, 201988]

As it can be seen the consecutive run of months are all mapped to the same key, to understand why this happens check this question.

As as side note, we need to do list(group) because group is an iterable not a list.

@dani im interested in understanding the nested loop, is there a way to explain it a bit ^_^? — ombk, Nov 29 '20 at 15:25

Identify groups of continuous time intervals in a list

1 Answers1