0

I have a list of dates with some days missing. I'm trying to get an array of date ranges with no dates as an output. At the moment I can get the desired output as map objects, but cannot convert them into a single array. my code is as follows

import os
import pandas as pd
import numpy as np
from datetime import datetime
from itertools import groupby
from operator import itemgetter

Converting a list of strings into datetime.date. newdates is my original date with the missing days

In[1]:
newdates = [datetime.strptime(date, '%Y-%m-%d').date() for date in newdates]

Print(newdates)
Out[1]: 
[datetime.date(2013, 11, 5),..., datetime.date(2013, 12, 31)]

Creating a date range for my desired year and using .difference to output a list of strings of dates that were missing in my original data.

In[2]:    
TEST = pd.date_range(start = '2013, 01, 01', end = '2013, 12, 31').difference(newdates)
TEST = TEST.strftime('%Y-%m-%d').tolist()

I found code from @jab answer to this question (Split a list of dates into subsets of consecutive dates) which groups the consecutive days. It outputs the desired data, however in multiple map.objects.

def consecutive_groups(iterable, ordering=lambda x: x):
for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
    yield map(itemgetter(1),g)
    
for g in consecutive_groups(TEST, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):

print(list(g))

Out[2]:
['2013-01-01',..., '2013-11-04']
['2013-11-24']

Ive tried to convert the map objects to lists (i would like a single array though) by the following:

for g in consecutive_groups(TEST, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):  
dates = list(g)

This gives me a list of the final map object but not all.

I've also tried using np.fromiter, but can't figure out how to get a range.

In conclusion, I would like to convert the output (list(g)) to an array which would look like this:

[['2013-01-01',..., '2013-11-04'],['2013-11-24']]
Josh Alt
  • 29
  • 9
  • could you explain "'i am trying to get an array of date ranges with no dates as an output". what do you mean by "no of dates"? – hammi Aug 08 '20 at 18:00
  • 1
    To explain better @Hammad, I have a list of dates from the year 2013, this list has multiple days missing. Let's say it is missing the month of January and November. So i would like the output to be a multidimensional array looking like this: [['2013-01-01, ..., 2013-01-31], [2013-11-01, ... , 2013-11-30]]. With the ellipses (...) representing a date for every day between the two dates. Essentially i want to group consecutive days. – Josh Alt Aug 08 '20 at 18:10

1 Answers1

1
for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
    yield map(itemgetter(1),g)

in python 2 the map would return a list directly, in python3 it is a slightly less helpful iterator, so just wrap it in list:

for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
    yield list(map(itemgetter(1),g))

then the output would be just:

dates = list(consecutive_groups(...))

without changing the function you would just need a list comprehension to call list on each element like this:

dates = [list(group) for group in consecutive_groups(...)]

either way, the issue is that because you are looping through the consecutive_groups call you are getting each element separately, the way you'd add each one to a larger list would be with append:

dates = []
for g in consecutive_groups(...):
    dates.append(list(g))
Tadhg McDonald-Jensen
  • 20,699
  • 5
  • 35
  • 59