Let's get right into the question. The following is the daily data:
AAA BBB CCC
date
2012-04-16 44.48 28.48 17.65
2012-04-17 44.59 28.74 17.65
2012-04-18 44.92 28.74 17.72
2012-04-19 44.92 28.62 17.72
2012-04-20 45.09 28.68 17.71
2012-04-23 45.09 28.40 17.76
2012-04-24 45.09 28.51 17.73
2012-04-25 45.01 28.76 17.73
2012-04-26 45.40 28.94 17.76
2012-04-27 45.57 29.02 17.79
2012-04-30 45.45 28.90 17.80
2012-05-01 45.79 29.07 17.80
2012-05-02 45.71 28.98 17.77
2012-05-03 45.44 28.81 17.79
2012-05-04 45.05 28.48 17.79
2012-05-07 45.05 28.48 17.79
2012-05-08 45.00 28.40 17.93
2012-05-09 44.87 28.30 17.94
2012-05-10 44.93 28.34 17.85
2012-05-11 44.86 28.30 17.96
... ... ...
I want to select the rows starting from the first row with a monthly increment, that is, the rows whose index is 2012-04-16, 2012-05-16, 2012-06-16, ... . I can just use relativedelta and manually add them but I'm wondering if there is a more efficient method. I tried resampling, but I could only choose the first or last of each month as in df.resample('M').first()
.
What makes the problem more complicated is that some of the dates are missing; they are business days but not those of U.S.. There are several ways to handle this problem:
Choose the exact date or the earlier one closest to the date. If such date is nonexistent, then start looking up for the later dates.
Choose the exact date or the later one closest to the date. If such date is nonexistent, then start looking up for the earlier dates.
Choose the closest date to the exact date regardless of being early or late; I can use
min(df.index, key=lambda x: abs(x - (df.index[0] + relativedelta(months=1)))
.
And in each of these cases, I wonder which method is the most efficient and easy to read. In the last code example, the month is a variable so I'm not sure if I can make it as a lambda procedure and use 'apply'.
Thanks in advance.