5

I'd like to change the value associated with the first day in every month for a pandas.Series I have. For example, given something like this:

Date
1984-01-03    0.992701
1984-01-04    1.003614
1984-01-17    0.994647
1984-01-18    1.007440
1984-01-27    1.006097
1984-01-30    0.991546
1984-01-31    1.002928
1984-02-01    1.009894
1984-02-02    0.996608
1984-02-03    0.996595
                ...

I'd like to change the values associated with 1984-01-03, 1984-02-01 and so on. I've racked my brain for hours on this one and have looked around Stack Overflow a fair bit. Some solutions have come close. For example, using:

[In]: series.groupby((m_ret.index.year, m_ret.index.month)).first()

[Out]:

Date  Date
1984  1       0.992701
      2       1.009894
      3       1.005963
      4       0.997899
      5       1.000342
      6       0.995429
      7       0.994620
      8       1.019377
      9       0.993209
      10      1.000992
      11      1.009786
      12      0.999069
1985  1       0.981220
      2       1.011928
      3       0.993042
      4       1.015153
                ...

Is almost there, but I'm sturggling to proceed further.

What I'd ike to do is set the values associated with the first day present in each month for every year to 1.

series[m_ret.index.is_month_start] = 1 comes close, but the problem here is that is_month_start only selects rows where the day value is 1. However in my series, this isn't always the case as you can see. For example, the date of the first day in January is 1984-01-03.

series.groupby(pd.TimeGrouper('BM')).nth(0) doesn't appear to return the first day either, instead I get the last day:

Date
1984-01-31    0.992701
1984-02-29    1.009894
1984-03-30    1.005963
1984-04-30    0.997899
1984-05-31    1.000342
1984-06-29    0.995429
1984-07-31    0.994620
1984-08-31    1.019377
                ...

I'm completely stumped. Your help is as always, greatly appreciated! Thank you.

Bango
  • 155
  • 1
  • 9

1 Answers1

4

One way would to be to use your .groupby((m_ret.index.year, m_ret.index.month)) idea, but use idxmin instead on the index itself converted into a Series:

In [74]: s.index.to_series().groupby([s.index.year, s.index.month]).idxmin()
Out[74]: 
Date  Date
1984  1      1984-01-03
      2      1984-02-01
Name: Date, dtype: datetime64[ns]

In [75]: start = s.index.to_series().groupby([s.index.year, s.index.month]).idxmin()

In [76]: s.loc[start] = 999

In [77]: s
Out[77]: 
Date
1984-01-03    999.000000
1984-01-04      1.003614
1984-01-17      0.994647
1984-01-18      1.007440
1984-01-27      1.006097
1984-01-30      0.991546
1984-01-31      1.002928
1984-02-01    999.000000
1984-02-02      0.996608
1984-02-03      0.996595
dtype: float64
DSM
  • 342,061
  • 65
  • 592
  • 494
  • that appears to have worked, thank you! Out of curiosity, is there a more efficient/faster way of doing this? Not that I require it for this task, more for my general understanding. – Bango Jul 13 '17 at 00:39
  • @Bango please consider up-voting this answer as well. Thx https://stackoverflow.com/help/someone-answers – piRSquared Jul 13 '17 at 00:41
  • `s.index.to_series().groupby(pd.TimeGrouper('M')).min()` – piRSquared Jul 13 '17 at 00:43
  • @piRSquared thanks for the alternative code, and upvoted :) – Bango Jul 13 '17 at 09:37