-1

I have a list that contains date and the list looks like as follow:

list = ['2020-04-05',
 '2020-04-12',
 '2020-04-19',
 '2020-04-26',
 '2020-05-03',
 '2020-05-10',
 '2020-05-17',
 '2020-05-24',
 '2020-05-31',
 '2020-06-07',
 '2020-06-14',
 '2020-06-21',
 '2020-06-28',
 '2020-07-05',
 '2020-07-12',
 '2020-07-19',
 '2020-07-26',
 '2020-08-02',
 '2020-08-09',
 '2020-08-16',
 '2020-08-23',
 '2020-08-30',
 '2020-09-06',
 '2020-09-20',
 '2020-09-13']

I want to extract the highest date of each month from the list. My desired output looks like as follow:

Desired list = ['2020-04-26','2020-05-31','2020-06-28','2020-07-26','2020-08-30','2020-09-20']

I tried to find the max date of the month using Max() but it gave overall max date from the list. Is there any way of finding max date of each month from the list.

user2293224
  • 2,128
  • 5
  • 28
  • 52
  • Here's the most terse solution I could come up with (uses itertools.groupby and datetime.datetime). `[list(g)[-1] for _, g in groupby(l, key=lambda x: datetime.strptime(x[:-3], '%Y-%m'))]` – marklap Sep 23 '20 at 21:19

3 Answers3

1

You can do

s = pd.Series(pd.to_datetime(l))
new_list = s.groupby(s.dt.strftime('%Y-%m')).max().tolist()
new_list
[Timestamp('2020-04-26 00:00:00'), Timestamp('2020-05-31 00:00:00'), Timestamp('2020-06-28 00:00:00'), Timestamp('2020-07-26 00:00:00'), Timestamp('2020-08-30 00:00:00'), Timestamp('2020-09-20 00:00:00')]
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You may

  • convert each string date to a date object using date.fromisoformat
  • group the values by month
  • for each month retrieve the max
  • convert back to string
from datetime import date
from itertools import groupby
from operator import methodcaller

dates = map(date.fromisoformat, values)
dates_pre_month = groupby(sorted(dates, key=lambda x: x.month), key=lambda x: x.month)

max_per_month = (max(month_values) for month, month_values in dates_pre_month)
max_per_month = list(map(methodcaller("isoformat"), max_per_month))
azro
  • 53,056
  • 7
  • 34
  • 70
  • Nice solution! Suggestion to sort by the complete date instead of date.month to avoid grouping the same month in different years. sorted(dates, key=lambda x: x) – Constanza Quaglia Jul 31 '23 at 14:47
0

Here is how:

dates = ['2020-04-05',
         '2020-04-12',
         '2020-04-19',
         '2020-04-26',
         '2020-05-03',
         '2020-05-10',
         '2020-05-17',
         '2020-05-24',
         '2020-05-31',
         '2020-06-07',
         '2020-06-14',
         '2020-06-21',
         '2020-06-28',
         '2020-07-05',
         '2020-07-12',
         '2020-07-19',
         '2020-07-26',
         '2020-08-02',
         '2020-08-09',
         '2020-08-16',
         '2020-08-23',
         '2020-08-30',
         '2020-09-06',
         '2020-09-20',
         '2020-09-13']


dct = {date.rsplit('-', 1)[0]: [] for date in dates} # Dict with keys as all the dates without the day number, and values as empty lists
for date in dates:
    y, m, d = date.split('-')
    dct[f'{y}-{m}'].append(int(d)) # Add all the days from the months into the empty lists
        
print([f'{ym}-{max(dct[ym])}' for ym in dct]) # Concat the year and months with the greatest date

Output:

['2020-04-26', '2020-05-31', '2020-06-28', '2020-07-26', '2020-08-30', '2020-09-20']
Red
  • 26,798
  • 7
  • 36
  • 58