0

I have this list of date:

dates = ['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2020-01', '2020-03', '2020-04', '2020-05', '2020-09', '2020-10']

What I want is to be able to detect that :

['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12'] is a continuous time interval

['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11'] is another one

then there is

['2020-01'],

['2020-03', '2020-04', '2020-05']

and ['2020-09', '2020-10']

I would appreciate your help. Thanks

1 Answers1

2

You could use itertools.groupby:

from itertools import groupby, count

dates = ['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2019-01', '2019-02', '2019-03', '2019-04',
         '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2020-01', '2020-03', '2020-04',
         '2020-05', '2020-09', '2020-10']
counter = count(0)

res = [list(group) for _, group in groupby(dates, key=lambda x: int(x.replace('-', '')) - next(counter))]

for g in res:
    print(g)

Output

['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12']
['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11']
['2020-01']
['2020-03', '2020-04', '2020-05']
['2020-09', '2020-10']

The above code listing is an adaptation of an old recipe for finding runs of consecutive numbers, see the examples here

The main idea is to group the input by the key function, to better understand what it's happening let's apply the key function to the values:

res = list(map(lambda x: int(x.replace('-', '')) - next(counter), dates))
print(res)

Output (mapping key to the elements of dates)

[201707, 201707, 201707, 201707, 201707, 201707, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201895, 201984, 201985, 201985, 201985, 201988, 201988]

As it can be seen the consecutive run of months are all mapped to the same key, to understand why this happens check this question.

As as side note, we need to do list(group) because group is an iterable not a list.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • @dani im interested in understanding the nested loop, is there a way to explain it a bit ^_^? – ombk Nov 29 '20 at 15:25