0

Based on this thread: Pandas Subset of a Time Series Without Resampling

The goal is to return the latest date in a month (with a value), and return that value.

Sample code:

Date    CumReturn
3/31/2017    1
4/3/2017     .99
5/31/2017    1.022
4/4/2017     100
4/28/2017    1.012
5/1/2017     1.011
6/30/2017    1.033

import pandas as pd
df = pd.read_clipboard(parse_dates = ['Date'])
df.set_index('Date')
df

I thought this would work:

df.groupby(pd.Grouper(freq = 'M')).max()

But it returns the dates corresponding to the highest values (CumReturn), rather than the max dates in the index.

df.groupby(pd.Grouper(freq = 'M')).last()

However, the output shows that the last day in April is chosen, rather than the latest day in the df. pandas assigns the value from April 28 to April 30, and returns this df:

    CumReturn
Date    
2017-03-31  1.000
2017-04-30  1.012
2017-05-31  1.022
2017-06-30  1.033

What causes this behavior? I assume pandas is just picking the latest date in each month, but that seems odd since those dates aren't present in the original data.

Evan
  • 2,121
  • 14
  • 27
  • What do you think should happen instead? If you were to plot a histogram and the latest data you had available in the month was on the 3rd, what would it look like? If you categorise by month, shouldn't it cover the whole month? – roganjosh Jan 06 '18 at 00:33
  • I thought that pandas would pick the latest day present in the grouped data - so, per your example, the 3rd. In this case, the data for the 3rd is assigned to the 30th (etc.). – Evan Jan 06 '18 at 00:36
  • But you grouped by month. As I said in my comment, it would be nonsensical to report that value at the maximum reported date within that month, rather than the month end. I'm trying to think of an analogy. – roganjosh Jan 06 '18 at 00:38
  • I am confused as to the difference between `Grouper` and `groupby`, then; for example, in the linked question, grouping by year and month returns the last value present in the dataframe: `df.sort_values('Date').groupby([df.Date.dt.year, df.Date.dt.month]).last()` – Evan Jan 06 '18 at 00:58
  • 1
    Aha, ok, that makes it easier :) . `groupby` uses your raw data, `Grouper` _categorises_ your data (i.e. in which month it falls in this example) – roganjosh Jan 06 '18 at 01:01
  • Thanks. If you want to post an answer, I will mark it as accepted. Cheers! – Evan Jan 08 '18 at 15:05

0 Answers0