1

Say I have a list of objects. Each of these has a string representing a date (parseable by dateutil). How can I go about grouping these in a list of lists, in which each sublist contains consecutive (within 5 minutes) objects? For example:

o1.time = "2016-03-01 23:25:00-08:00"
o2.time = "2016-03-01 23:30:00-08:00"
o3.time = "2016-03-01 23:35:00-08:00"
o4.time = "2016-03-02 12:35:00-08:00"

list1 = [o1, o2, o3, o4]
list2 = group_by_time(list1)

at which point list2 would be

[[o1,o2,o3],[o4]]

It seems like there should be a python solution using lambdas or itertools along with dateutil, but my google schools are failing me.

Thanks!

Championcake
  • 193
  • 3
  • 10

3 Answers3

3

Take a look at groupby function from itertools. It takes a list of objects and groups them according to a key function. Your code could look like this

from dateutil.parser import parse
from itertools import groupby

def rounded_date(item):
    d = parse(item.time)
    # round date
    return d

grouped_items = groupby(items, keyfunc=rounded_date)

have a look at this question to find out how to round datetimes: How to round the minute of a datetime object python

Community
  • 1
  • 1
linqu
  • 11,320
  • 8
  • 55
  • 67
  • Groupby seems like what I'm looking for. But given this, I can't just round the datetimes down. I need to group the consecutive objects, which may span more than I'm rounding. – Championcake Mar 03 '16 at 07:58
  • you mean you want to preserve order in the grouped list? you could do that in a subsequent step: ```for items in grouped_items: items.sort(key=lambda i:parse(i.item))``` – linqu Mar 03 '16 at 08:03
  • No. I'm saying that in your example, each of the dates must be equal for them to be grouped. Each of the times are five minutes apart, but together could span multiple hours, so I'm not able to round them. The part You've shown is the good and all, but the missing part is what I can't figure out. – Championcake Mar 03 '16 at 08:08
  • I see, I got you question wrong. So it's more like a clustering, you could do it with a list iteration. I will sketch it in another answer – linqu Mar 03 '16 at 08:13
1

My previous answer was not exactly solving the problem. You want to cluster all consequent items that have less than 5 minutes between each other. There are cluster algorithms you might have a look at, but with some simple lines of code this problem can also get solved. Btw there are many different ways to do this, this is just one:

from datetime import timedelta

timedeltas = [timedelta(0)]
for i in range(1, len(items)):
    delta = parse(item[i].time) - parse(item[i-1].time)
    timedeltas.add(delta)

split_indices = [i for i in range(0, len(deltas)) if timedeltas[i] > timedelta(minutes=5)]

the rest should be easy

linqu
  • 11,320
  • 8
  • 55
  • 67
1

Here's a generator that yields groups of consecutive objects:

import datetime
import dateutil.parser

five_minutes = datetime.timedelta(minutes=5)

def group_by_time(objects):
    objects = iter(objects)
    obj = next(objects)
    last = dateutil.parser.parse(obj.time)
    group = [obj]
    for obj in objects:
        time = dateutil.parser.parse(obj.time)
        if time > last + five_minutes:
            yield group
            group = []
        group.append(obj)
        last = time
    else:
        yield group
Thomas Lotze
  • 5,153
  • 1
  • 16
  • 16