5

I have a timeseries without every date (ie. trading dates). Series can be reproduced here.

 dates=pd.Series(np.random.randint(100,size=30),index=pd.to_datetime(['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
           '2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
           '2010-01-14', '2010-01-15', '2010-01-19', '2010-01-20',
           '2010-01-21', '2010-01-22', '2010-01-25', '2010-01-26',
           '2010-01-27', '2010-01-28', '2010-01-29', '2010-02-01',
           '2010-02-02', '2010-02-03', '2010-02-04', '2010-02-05',
           '2010-02-08', '2010-02-09', '2010-02-10', '2010-02-11',
           '2010-02-12', '2010-02-16']))

I would like the last day of the month in my list of dates ie: '2010-01-29' and '2010-02-16'

I have looked at Get the last date of each month in a list of dates in Python

and more specifically...

import pandas as pd
import numpy as np

df = pd.read_csv('/path/to/file/')          # Load a dataframe with your file
df.index = df['my_date_field']              # set the dataframe index with your date
dfg = df.groupby(pd.TimeGrouper(freq='M'))  # group by month / alternatively use MS for Month Start / referencing the previously created object

# Finally, find the max date in each month
dfg.agg({'my_date_field': np.max})

# To specifically coerce the results of the groupby to a list:
dfg.agg({'my_date_field': np.max})['my_date_field'].tolist()

... but can't quite figure out how to adapt this to my application. Thanks in advance.

Community
  • 1
  • 1
wlbsr
  • 127
  • 1
  • 11

2 Answers2

2

You can try the following to get your desired output:

import numpy as np
import pandas as pd


dates=pd.Series(np.random.randint(100,size=30),index=pd.to_datetime(['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
           '2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
           '2010-01-14', '2010-01-15', '2010-01-19', '2010-01-20',
           '2010-01-21', '2010-01-22', '2010-01-25', '2010-01-26',
           '2010-01-27', '2010-01-28', '2010-01-29', '2010-02-01',
           '2010-02-02', '2010-02-03', '2010-02-04', '2010-02-05',
           '2010-02-08', '2010-02-09', '2010-02-10', '2010-02-11',
           '2010-02-12', '2010-02-16']))

This:

dates.groupby(dates.index.month).apply(pd.Series.tail,1).reset_index(level=0, drop=True)

Or this:

dates[dates.groupby(dates.index.month).apply(lambda s: np.max(s.index))]

Both should yield something like the following:

#2010-01-29    43
#2010-02-16    48

To convert it into a list:

dates.groupby(dates.index.month).apply(pd.Series.tail,1).reset_index(level=0, drop=True).tolist()

Or:

dates[dates.groupby(dates.index.month).apply(lambda s: np.max(s.index))].tolist()

Both yield something like:

#[43, 48]

If you're dealing with a dataset that spans beyond one year, then you will need to group by both year and month. The following should help:

import numpy as np
import pandas as pd


z = ['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07', 
'2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13', 
'2010-01-14', '2010-01-15', '2010-01-19', '2010-01-20', 
'2010-01-21', '2010-01-22', '2010-01-25', '2010-01-26', 
'2010-01-27', '2010-01-28', '2010-01-29', '2010-02-01', 
'2010-02-02', '2010-02-03', '2010-02-04', '2010-02-05', 
'2010-02-08', '2010-02-09', '2010-02-10', '2010-02-11', 
'2010-02-12', '2010-02-16', '2011-01-04', '2011-01-05', 
'2011-01-06', '2011-01-07', '2011-01-08', '2011-01-11', 
'2011-01-12', '2011-01-13', '2011-01-14', '2011-01-15', 
'2011-01-19', '2011-01-20', '2011-01-21', '2011-01-22', 
'2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28', 
'2011-01-29', '2011-02-01', '2011-02-02', '2011-02-03', 
'2011-02-04', '2011-02-05', '2011-02-08', '2011-02-09', 
'2011-02-10', '2011-02-11', '2011-02-12', '2011-02-16']

dates1 = pd.Series(np.random.randint(100,size=60),index=pd.to_datetime(z))

This:

dates1.groupby((dates1.index.year, dates1.index.month)).apply(pd.Series.tail,1).reset_index(level=(0,1), drop=True)

Or:

dates1[dates1.groupby((dates1.index.year, dates1.index.month)).apply(lambda s: np.max(s.index))]

Both yield something like:

# 2010-01-29    66
# 2010-02-16    80
# 2011-01-29    13
# 2011-02-16    10

I hope this proves useful.

Abdou
  • 12,931
  • 4
  • 39
  • 42
  • this does the trick @Abdou. I actually tried this on my own and for some reason could not get it to work... looks like it was the double quotations around both 'groupby' levels. ie: df.groupby((first, second)) – wlbsr Mar 03 '17 at 07:11
  • first example could be used with a dataframe instead of a series. second example possibly cannot be used with a dataframe. – wlbsr Mar 03 '17 at 07:32
  • @wlbsr there are no quotations around the groupby levels. Are you referring to the parentheses? If so, those need to be there. Also, the solution is written for and tested on series just like you've indicated in your question. If you're testing this on a dataframe you're likely going to get varying results. Please share errors from your testing these pieces of code. – Abdou Mar 03 '17 at 10:59
1

You can use groupby by month and apply last value of index:

print (dates.groupby(dates.index.month).apply(lambda x: x.index[-1]))
1   2010-01-29
2   2010-02-16
dtype: datetime64[ns]

Another solution:

print (dates.groupby(dates.index.month).apply(lambda x: x.index.max()))
1   2010-01-29
2   2010-02-16
dtype: datetime64[ns]

For list first convert to string by strftime:

print (dates.groupby(dates.index.month)
            .apply(lambda x: x.index[-1]).dt.strftime('%Y-%m-%d').tolist())
['2010-01-29', '2010-02-16']

If need values per last Month value use iloc:

print (dates.groupby(dates.index.month).apply(lambda x: x.iloc[-1]))
1    55
2    48
dtype: int64

print (dates.groupby(dates.index.month).apply(lambda x: x.iloc[-1]).tolist())
[55, 48]

EDIT:

For year and month need convert index to_period by months:

dates=pd.Series(np.random.randint(100,size=30),index=pd.to_datetime(
          ['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
           '2010-01-08', '2011-01-11', '2011-01-12', '2011-01-13',
           '2012-01-14', '2012-01-15', '2012-01-19', '2012-01-20',
           '2013-01-21', '2013-01-22', '2013-01-25', '2013-01-26',
           '2013-01-27', '2013-01-28', '2013-01-29', '2013-02-01',
           '2014-02-02', '2014-02-03', '2014-02-04', '2014-02-05',
           '2015-02-08', '2015-02-09', '2015-02-10', '2015-02-11',
           '2016-02-12', '2016-02-16']))
#print (dates)
print (dates.groupby(dates.index.to_period('m')).apply(lambda x: x.index[-1]))
2010-01   2010-01-08
2011-01   2011-01-13
2012-01   2012-01-20
2013-01   2013-01-29
2013-02   2013-02-01
2014-02   2014-02-05
2015-02   2015-02-11
2016-02   2016-02-16
Freq: M, dtype: datetime64[ns]

print (dates.groupby(dates.index.to_period('m'))
            .apply(lambda x: x.index[-1]).dt.strftime('%Y-%m-%d').tolist())
['2010-01-08', '2011-01-13', '2012-01-20', '2013-01-29', 
'2013-02-01', '2014-02-05', '2015-02-11', '2016-02-16']           
print (dates.groupby(dates.index.to_period('m')).apply(lambda x: x.iloc[-1]))
2010-01    68
2011-01    96
2012-01    53
2013-01     4
2013-02    16
2014-02    18
2015-02    41
2016-02    90
Freq: M, dtype: int64

print (dates.groupby(dates.index.to_period('m')).apply(lambda x: x.iloc[-1]).tolist())
[68, 96, 53, 4, 16, 18, 41, 90]

EDIT1: If need convert period to end of month datetime:

df = dates.groupby(dates.index.to_period('m')).apply(lambda x: x.index[-1])
df.index = df.index.to_timestamp('m')
print (df)
2010-01-31   2010-01-08
2011-01-31   2011-01-13
2012-01-31   2012-01-20
2013-01-31   2013-01-29
2013-02-28   2013-02-01
2014-02-28   2014-02-05
2015-02-28   2015-02-11
2016-02-29   2016-02-16
dtype: datetime64[ns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • all above answers are correct based on my question and example but upon implementation I see that I need to clarify. If the data extends over multiple years what I am looking for is the last day of the month in my list of dates (not all dates) for each month of each year. It looks like grouping by months give me the last date of all months in all years, not the last date of each month in each year. help here? I am thinking possibly group by year and then date but I am not exactly sure of the syntax. – wlbsr Mar 02 '17 at 23:50
  • yes, that is what I was looking for except the date / index becomes 2009-12 rather than preserving the actual date 2009-12-31 – wlbsr Mar 03 '17 at 07:06
  • I am not sure if understand, you need convert period to end day of month? See last edit. – jezrael Mar 03 '17 at 07:13
  • actually like this answer a little better than the one above but look at the other answer and specifically the output below where it says: 'Both yield something like:' – wlbsr Mar 03 '17 at 07:46
  • ok, but you ask `I would like the last day of the month in my list of dates ie: '2010-01-29' and '2010-02-16'`. So i think it is your desired output. If need something else, the best is create new question with sample. desired output and what you try. Because without it is very hard answering... – jezrael Mar 03 '17 at 07:55