Pandas Subset of a Time Series Without Resampling

Question

I have a pandas data series with cumulative daily returns for a series:

Date    CumReturn
3/31/2017    1
4/3/2017     .99
4/4/2017     .992
 ...        ...
4/28/2017    1.012
5/1/2017     1.011
 ...         ...
5/31/2017    1.022
 ...         ...
6/30/2017    1.033
 ...         ...

I want only the month-end values.

Date    CumReturn
4/28/2017    1.012
5/31/2017    1.022
6/30/2017    1.033

Because I want only the month-end values, resampling doesn't work as it aggregates the interim values.

What is the easiest way to get only the month end values as they appear in the original dataframe?

How would I do that? I'm a little shaky on timeseries manipulation. — Windstorm1981, Jan 05 '18 at 21:18
It's pretty well documented here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html Something like this: `from pandas.tseries.offsets import MonthEnd df['EndOfMonth'] = pd.to_datetime(df['Date'], format="%m/%d/%Y") + MonthEnd(1)` where MonthEnd(1) specifies increment one to the next date that's a month end — user8834780, Jan 05 '18 at 21:27
Thanks for the reference to documentation. I did have a look at that before posting question but it didn't seem to answer my question. If I understand your comment correctly, I would derive a list or series of month-end values using the offsets and then use that to get a subset of the dataframe? — Windstorm1981, Jan 05 '18 at 21:28
Doing the above will get you EndOfMonth for each Date, so you will need to subset to only keep records for Date=EndOfMonth — user8834780, Jan 05 '18 at 21:30
Why is 4/28/2017 in your desired output, but not 3/31/2017? 4/28 isn't the end of a month, but 3/31 is. — root, Jan 05 '18 at 21:32
I'm being a bit sloppy. 4/28 is the last business date of that month. So I could us the BM offset. and I don't need 3/31 since its the first date but no worries if I get it. — Windstorm1981, Jan 05 '18 at 21:37

root · Answer 1 · 2018-01-05T21:43:51.220

3

Use the is_month_end component of the .dt date accessor:

# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])

# Filter to end of the month only
df = df[df['Date'].dt.is_month_end]

Applying this to the data you provided:

        Date  CumReturn
0 2017-03-31      1.000
5 2017-05-31      1.022
6 2017-06-30      1.033

EDIT

To get business month end, compare using BMonthEnd(0):

from pandas.tseries.offsets import BMonthEnd

# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])

# Filter to end of the month only
df = df[df['Date'] == df['Date'] + BMonthEnd(0)]

Applying this to the data you provided:

        Date  CumReturn
0 2017-03-31      1.000
3 2017-04-28      1.012
5 2017-05-31      1.022
6 2017-06-30      1.033

edited Jan 05 '18 at 21:43

answered Jan 05 '18 at 21:35

root

32,715
6
74
87

Thanks for this. Can you provide a document reference? I didn't come across that in my searches. – Windstorm1981 Jan 05 '18 at 21:39
@Windstorm1981: see the links in my answer. `is_month_end` won't work if you want _business_ month end though. – root Jan 05 '18 at 21:40
Yes that might be problematic. What I really need is some way to subset the dataframe taking the max date per month. That is because this is business data and some month-end dates fall on the weekends. I can subset the dataframe by month and loop through to get the max but it seems like there should be an easier way. I just saw the second answer (below). Though a bit less elegant it might be a better solution to my particular problem. – Windstorm1981 Jan 05 '18 at 21:44
I added a solution to get business month end. If all you really care about is the _max_ date per month, regardless of if it's a month end, business month end, or neither, use @Evan's solution. Please formulate your questions more precisely in the future. – root Jan 05 '18 at 21:46
This is really nice. I will delete my answer; pandas' behavior is worth noting but the solution is not correct. – Evan Jan 06 '18 at 00:12

score 2 · Answer 2 · answered Jan 05 '18 at 22:04

2

df.sort_values('Date').groupby([df.Date.dt.year,df.Date.dt.month]).last()
Out[197]: 
                Date  CumReturn
Date Date                      
2017 3    2017-03-31      1.000
     4    2017-04-28      1.012
     5    2017-05-31      1.022
     6    2017-06-30      1.033

answered Jan 05 '18 at 22:04

BENY

317,841
20
164
234

This is really nice. I will delete my answer; pandas' behavior is worth noting but the solution is not correct. – Evan Jan 06 '18 at 00:12

score 1 · Answer 3 · answered Jan 05 '18 at 22:04

Assuming that the dataframe is already sorted by 'Date' and that the values in that column are Pandas timestamps, you can convert them to YYYY-mm string values for grouping and take the last value:

df.groupby(df['Date'].dt.strftime('%Y-%m'))['CumReturn'].last()

# Example output:
# 2017-01    0.127002
# 2017-02    0.046894
# 2017-03    0.005560
# 2017-04    0.150368

Pandas Subset of a Time Series Without Resampling

3 Answers3

Linked