Python splitting a datetime list based on missing days

Question

How can I split a datetime list with missing dates into a list of lists based on missing dates?

Using the following example:

date_list = [
        datetime.datetime(2012,1,1,0,0,0), 
        datetime.datetime(2012,1,2,0,0,0), 
        datetime.datetime(2012,1,4,0,0,0), 
        datetime.datetime(2012,1,7,0,0,0),
        datetime.datetime(2012,1,8,0,0,0),
        ]

The result I'm looking for here is

[[datetime.datetime(2012,1,1,0,0,0), datetime.datetime(2012,1,2,0,0,0)]
[datetime.datetime(2012,1,4,0,0,0)], 
[datetime.datetime(2012,1,7,0,0,0), datetime.datetime(2012,1,8,0,0,0)]]

I tried using groupby but I can't figure out what to use for the key.

[list(g) for k, g in itertools.groupby(date_list, key=lambda d: d.day)]

you'll probably find the second example in (an old version of) [the itertools docs](https://docs.python.org/2.6/library/itertools.html#examples) useful. On the other hand if you don't care about being super fancy, [writing your own generator](http://stackoverflow.com/questions/21142231/group-consecutive-integers-and-tolerate-gaps-of-1/21142465#21142465) is pretty straightforward. — roippi, Dec 05 '14 at 00:18

Kevin Cherepski · Accepted Answer · 2014-12-05T00:38:20.327

This works for the given example...

>>> import datetime
>>> date_list = [
...         datetime.datetime(2012,1,1,0,0,0),
...         datetime.datetime(2012,1,2,0,0,0),
...         datetime.datetime(2012,1,4,0,0,0),
...         datetime.datetime(2012,1,7,0,0,0),
...         datetime.datetime(2012,1,8,0,0,0),
...         ]
>>> import itertools
>>> [list(g) for k, g in itertools.groupby(enumerate(date_list), key=lambda (i, x): i-x.day)]
[[(0, datetime.datetime(2012, 1, 1, 0, 0)), (1, datetime.datetime(2012, 1, 2, 0, 0))], [(2, datetime.datetime(2012, 1, 4, 0, 0))], [(3, datetime.datetime(2012, 1, 7, 0, 0)), (4, datetime.datetime(2012, 1, 8, 0, 0))]]

This may be better if you don't want the index...

>>> [[v for i, v in g] for k, g in itertools.groupby(enumerate(date_list), key=lambda (i, x): i-x.day)]
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 2, 0, 0)], [datetime.datetime(2012, 1, 4, 0, 0)], [datetime.datetime(2012, 1, 7, 0, 0), datetime.datetime(2012, 1, 8, 0, 0)]]

ely · Answer 2 · 2014-12-05T01:09:36.840

Here is a boring for-loop helper function to do it.

def date_segments(dates):
    output = []
    cur_list = [dates[0]]
    for dt_pair in zip(dates[1:], dates):
        if (dt_pair[0] - dt_pair[1]).days > 1:
            output.append(cur_list)
            cur_list = [dt_pair[0]]
        else:
            cur_list.append(dt_pair[0])
    output.append(cur_list)
    return output

which gives:

In [28]: date_segments(date_list)
Out[28]: 
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 2, 0, 0)],
 [datetime.datetime(2012, 1, 4, 0, 0)],
 [datetime.datetime(2012, 1, 7, 0, 0), datetime.datetime(2012, 1, 8, 0, 0)]]

If I define the itertools.groupby approach as a helper function named other_way as below:

from itertools import groupby
def other_way(date_list):
    return [[v for i, v in g] for k, g in groupby(enumerate(date_list), 
                                                  key=lambda (i, x): i-x.day)]

then for this admittedly small example timeit shows this for-loop approach to be slightly faster:

In [31]: %timeit date_segments(date_list) 
100000 loops, best of 3: 3.2 µs per loop

In [32]: %timeit other_way(date_list)
100000 loops, best of 3: 3.72 µs per loop

and I, for one, find the for-loop approach much more Pythonic and readable.

elyase · Answer 3 · 2014-12-05T01:27:02.863

1

You could build a key that "switches" when there are no consecutive dates:

class Switcher():
    def __call__(self, d):
        if not hasattr(self, 'prev'):    # first element: init switch
            self.switch = 1
        elif (d - self.prev).days > 1:   # not consecutive: invert switch
            self.switch *= -1
        self.prev = d                    # save current value
        return self.switch

Then you can use it like:

>>> [list(g) for k, g in groupby(date_list, key = Switcher())]
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 2, 0, 0)],
 [datetime.datetime(2012, 1, 4, 0, 0)],
 [datetime.datetime(2012, 1, 7, 0, 0), datetime.datetime(2012, 1, 8, 0, 0)]]

edited Dec 05 '14 at 01:27

answered Dec 05 '14 at 00:48

elyase

39,479
12
112
119

If you are only making use of the `__call__` facilities of this class, why would you not just make it a function? Just delete the `class Switcher` line, move the indentation, and change the name `__call__` to whatever, and just compute the switch cases on `zip(date_list[1:], date_list)` ... It seems like this could only result is less code and less confusing code. – ely Dec 05 '14 at 01:13
@prpl.mnky.dshwshr, The reason is that the key object needs to have memory (`self.prev`, `self.switch`), in order to remember the previous element/switch state. A function would be stateless. – elyase Dec 05 '14 at 01:20
That's why I said to compute the switch cases off of the zip, rather than pretending like they are state. I'm not arguing whether you *can* represent it this way, just that it's not a good use of a class. Also, you *can* have "state" in the function, either by making it a generator, or using closures. – ely Dec 05 '14 at 01:28
@prpl.mnky.dshwshr, Oh, now I see what you mean. Still my purpose with this solution is that the OP can use his `groupby` line without modification. I would be interested in seeing how this can be implemented using a closure. BTW +1, your solution is really fast, – elyase Dec 05 '14 at 01:33

Python splitting a datetime list based on missing days

3 Answers3

Linked