1

How can I split a datetime list with missing dates into a list of lists based on missing dates?

Using the following example:

date_list = [
        datetime.datetime(2012,1,1,0,0,0), 
        datetime.datetime(2012,1,2,0,0,0), 
        datetime.datetime(2012,1,4,0,0,0), 
        datetime.datetime(2012,1,7,0,0,0),
        datetime.datetime(2012,1,8,0,0,0),
        ]

The result I'm looking for here is

[[datetime.datetime(2012,1,1,0,0,0), datetime.datetime(2012,1,2,0,0,0)]
[datetime.datetime(2012,1,4,0,0,0)], 
[datetime.datetime(2012,1,7,0,0,0), datetime.datetime(2012,1,8,0,0,0)]]

I tried using groupby but I can't figure out what to use for the key.

[list(g) for k, g in itertools.groupby(date_list, key=lambda d: d.day)]

Keith Morris
  • 275
  • 2
  • 9
pyCthon
  • 11,746
  • 20
  • 73
  • 135
  • you'll probably find the second example in (an old version of) [the itertools docs](https://docs.python.org/2.6/library/itertools.html#examples) useful. On the other hand if you don't care about being super fancy, [writing your own generator](http://stackoverflow.com/questions/21142231/group-consecutive-integers-and-tolerate-gaps-of-1/21142465#21142465) is pretty straightforward. – roippi Dec 05 '14 at 00:18

3 Answers3

2

This works for the given example...

>>> import datetime
>>> date_list = [
...         datetime.datetime(2012,1,1,0,0,0),
...         datetime.datetime(2012,1,2,0,0,0),
...         datetime.datetime(2012,1,4,0,0,0),
...         datetime.datetime(2012,1,7,0,0,0),
...         datetime.datetime(2012,1,8,0,0,0),
...         ]
>>> import itertools
>>> [list(g) for k, g in itertools.groupby(enumerate(date_list), key=lambda (i, x): i-x.day)]
[[(0, datetime.datetime(2012, 1, 1, 0, 0)), (1, datetime.datetime(2012, 1, 2, 0, 0))], [(2, datetime.datetime(2012, 1, 4, 0, 0))], [(3, datetime.datetime(2012, 1, 7, 0, 0)), (4, datetime.datetime(2012, 1, 8, 0, 0))]]

This may be better if you don't want the index...

>>> [[v for i, v in g] for k, g in itertools.groupby(enumerate(date_list), key=lambda (i, x): i-x.day)]
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 2, 0, 0)], [datetime.datetime(2012, 1, 4, 0, 0)], [datetime.datetime(2012, 1, 7, 0, 0), datetime.datetime(2012, 1, 8, 0, 0)]]
Kevin Cherepski
  • 1,473
  • 8
  • 8
2

Here is a boring for-loop helper function to do it.

def date_segments(dates):
    output = []
    cur_list = [dates[0]]
    for dt_pair in zip(dates[1:], dates):
        if (dt_pair[0] - dt_pair[1]).days > 1:
            output.append(cur_list)
            cur_list = [dt_pair[0]]
        else:
            cur_list.append(dt_pair[0])
    output.append(cur_list)
    return output

which gives:

In [28]: date_segments(date_list)
Out[28]: 
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 2, 0, 0)],
 [datetime.datetime(2012, 1, 4, 0, 0)],
 [datetime.datetime(2012, 1, 7, 0, 0), datetime.datetime(2012, 1, 8, 0, 0)]]

If I define the itertools.groupby approach as a helper function named other_way as below:

from itertools import groupby
def other_way(date_list):
    return [[v for i, v in g] for k, g in groupby(enumerate(date_list), 
                                                  key=lambda (i, x): i-x.day)]

then for this admittedly small example timeit shows this for-loop approach to be slightly faster:

In [31]: %timeit date_segments(date_list) 
100000 loops, best of 3: 3.2 µs per loop

In [32]: %timeit other_way(date_list)
100000 loops, best of 3: 3.72 µs per loop

and I, for one, find the for-loop approach much more Pythonic and readable.

ely
  • 74,674
  • 34
  • 147
  • 228
1

You could build a key that "switches" when there are no consecutive dates:

class Switcher():
    def __call__(self, d):
        if not hasattr(self, 'prev'):    # first element: init switch
            self.switch = 1
        elif (d - self.prev).days > 1:   # not consecutive: invert switch
            self.switch *= -1
        self.prev = d                    # save current value
        return self.switch

Then you can use it like:

>>> [list(g) for k, g in groupby(date_list, key = Switcher())]
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 2, 0, 0)],
 [datetime.datetime(2012, 1, 4, 0, 0)],
 [datetime.datetime(2012, 1, 7, 0, 0), datetime.datetime(2012, 1, 8, 0, 0)]]
elyase
  • 39,479
  • 12
  • 112
  • 119
  • If you are only making use of the `__call__` facilities of this class, why would you not just make it a function? Just delete the `class Switcher` line, move the indentation, and change the name `__call__` to whatever, and just compute the switch cases on `zip(date_list[1:], date_list)` ... It seems like this could only result is less code and less confusing code. – ely Dec 05 '14 at 01:13
  • @prpl.mnky.dshwshr, The reason is that the key object needs to have memory (`self.prev`, `self.switch`), in order to remember the previous element/switch state. A function would be stateless. – elyase Dec 05 '14 at 01:20
  • That's why I said to compute the switch cases off of the zip, rather than pretending like they are state. I'm not arguing whether you *can* represent it this way, just that it's not a good use of a class. Also, you *can* have "state" in the function, either by making it a generator, or using closures. – ely Dec 05 '14 at 01:28
  • @prpl.mnky.dshwshr, Oh, now I see what you mean. Still my purpose with this solution is that the OP can use his `groupby` line without modification. I would be interested in seeing how this can be implemented using a closure. BTW +1, your solution is really fast, – elyase Dec 05 '14 at 01:33