1

I have couple lists of tuples like this ones:

list_1 = [('2023-01-01', 'a'), ('2023-01-02', 'b'), ('2023-01-10', 'c')]
list_2 = [('2023-01-02', 'd'), ('2023-01-05', 'e'), ('2023-01-07', 'f')]
list_3 = [('2023-01-01', 'g'), ('2023-01-03', 'h'), ('2023-01-10', 'i')]

I need to fill in the missing dates with None value for each of the lists:

list_1 = [('2023-01-01', 'a'),  ('2023-01-02', 'b'),  ('2023-01-03', None), ('2023-01-05', None), ('2023-01-07', None), ('2023-01-10', 'c')]
list_2 = [('2023-01-01', None), ('2023-01-02', 'd'),  ('2023-01-03', None), ('2023-01-05', 'e'), ('2023-01-07', 'f'), ('2023-01-10', None)]
list_3 = [('2023-01-01', 'g'),  ('2023-01-02', None), ('2023-01-03', 'h'),  ('2023-01-05', None), ('2023-01-07', None)('2023-01-10', 'i')]

The number of tuple elements can vary.

What is the best and most efficient solution to do this ?

Vau
  • 11
  • 3
  • 2
    Please [edit] your question and spell out your entire criteria of "*missing dates*". Also do not worry about "*best*" and/or "*most efficient*", until you have working code of some kind. – PM 77-1 Feb 06 '23 at 17:20
  • I don't get the rule... aren't missing some tuples in the expected output of `list_1`? do the lists need to be considered independent from each other? – cards Feb 06 '23 at 19:57

1 Answers1

1

I cannot prove that it's the "best and most efficient", but here is one approach.

Unfortunately, python's builtin datetime module doesn't have a "get all dates in a range" function. But pandas has one.

from pandas import date_range

list_1 = [('2023-01-01', 'a'), ('2023-01-02', 'b'), ('2023-01-10', 'c')]

dates_in_list1 = {d for d,_ in list_1}
dates_in_range = {str(d)[:10] for d in date_range(min(dates_in_list1), max(dates_in_list1), freq='d')}
missing_dates = dates_in_range.difference(dates_in_list1)

new_list_1 = sorted(list_1 + [(d, None) for d in missing_dates])

print(new_list_1)
# [('2023-01-01', 'a'), ('2023-01-02', 'b'), ('2023-01-03', None), ('2023-01-04', None), ('2023-01-05', None), ('2023-01-06', None), ('2023-01-07', None), ('2023-01-08', None), ('2023-01-09', None), ('2023-01-10', 'c')]

Important note: all is done with strings, and it works because of the convenient format of the dates, so that lexicographical order of strings correspond to chronological order of dates.

Stef
  • 13,242
  • 2
  • 17
  • 28