0

I am trying to aggregate data based on the day of each month that it occurs. I want to group all of the Jan 1st together from 2010-2019, and then the Jan 2nd, etc...

I generate a list of dates from 2010-2019.

def list_dates(start, end):
    num_days = (end - start).days
    return [start + dt.timedelta(days=x) for x in range(num_days)]


start_date = dt.date(2010, 1, 1)
end_date = dt.date(2019, 12, 31)
date_list = list_dates(start_date, end_date)

Now, I am having trouble subdividing this list into 366 separate lists that only include similar days. Would be it be best to use some sort of dt.timedelta() operation?

Eli Turasky
  • 981
  • 2
  • 11
  • 28
  • you should consider having a look at the `pandas` library. here, you could simply group your data by day-of-year (similar [here](https://stackoverflow.com/questions/26646191/pandas-groupby-month-and-year)). – FObersteiner Jun 11 '20 at 17:17

1 Answers1

1

Here is a solution with itertools.groupby:

from itertools import groupby

def list_dates(start, end):
    num_days = (end - start).days
    dates = [start + dt.timedelta(days=x) for x in range(num_days)]
    sorted_dates = sorted(dates, key=lambda date: (date.month, date.day))
    grouped_dates = [list(g) for _, g in groupby(sorted_dates, key=lambda date: (date.month, date.day))]
    return grouped_dates


start_date = dt.date(2010, 1, 1)
end_date = dt.date(2019, 12, 31)
date_list = list_dates(start_date, end_date)
print(date_list[0])

Output:

[datetime.date(2010, 1, 1), datetime.date(2011, 1, 1), datetime.date(2012, 1, 1), datetime.date(2013, 1, 1), 
datetime.date(2014, 1, 1), datetime.date(2015, 1, 1), datetime.date(2016, 1, 1), datetime.date(2017, 1, 1), 
datetime.date(2018, 1, 1), datetime.date(2019, 1, 1)]
Asocia
  • 5,935
  • 2
  • 21
  • 46