24

I have a dataframe with monthly financial data:

In [89]: vfiax_monthly.head()
Out[89]: 
            year  month  day       d   open  close   high    low  volume  aclose
2003-01-31  2003      1   31  731246  64.95  64.95  64.95  64.95       0   64.95
2003-02-28  2003      2   28  731274  63.98  63.98  63.98  63.98       0   63.98
2003-03-31  2003      3   31  731305  64.59  64.59  64.59  64.59       0   64.59
2003-04-30  2003      4   30  731335  69.93  69.93  69.93  69.93       0   69.93
2003-05-30  2003      5   30  731365  73.61  73.61  73.61  73.61       0   73.61

I'm trying to calculate the returns like that:

In [90]: returns = (vfiax_monthly.open[1:] - vfiax_monthly.open[:-1])/vfiax_monthly.open[1:]

But I'm getting only zeroes:

In [91]: returns.head()
Out[91]: 
2003-01-31   NaN
2003-02-28     0
2003-03-31     0
2003-04-30     0
2003-05-30     0
Freq: BM, Name: open

I think that's because the arithmetic operations get aligned on the index and that makes the [1:] and [:-1] useless.

My workaround is:

In [103]: returns = (vfiax_monthly.open[1:].values - vfiax_monthly.open[:-1].values)/vfiax_monthly.open[1:].values

In [104]: returns = pd.Series(returns, index=vfiax_monthly.index[1:])

In [105]: returns.head()
Out[105]: 
2003-02-28   -0.015161
2003-03-31    0.009444
2003-04-30    0.076362
2003-05-30    0.049993
2003-06-30    0.012477
Freq: BM

Is there a better way to calculate the returns? I don't like the conversion to array and then back to Series.

Daniel
  • 26,899
  • 12
  • 60
  • 88

4 Answers4

39

Instead of slicing, use .shift to move the index position of values in a DataFrame/Series. For example:

returns = (vfiax_monthly.open - vfiax_monthly.open.shift(1))/vfiax_monthly.open.shift(1)

This is what pct_change is doing under the bonnet. You can also use it for other functions e.g.:

(3*vfiax_monthly.open + 2*vfiax_monthly.open.shift(1))/5

You might also want to looking into the rolling and window functions for other types of analysis of financial data.

Jinhua Wang
  • 1,679
  • 1
  • 17
  • 44
Matti John
  • 19,329
  • 7
  • 41
  • 39
  • 5
    I think the first line should read: returns = (vfiax_monthly.open - vfiax_monthly.open.shift(1))/vfiax_monthly.open.shift(1) – DonCristobal Jul 06 '15 at 22:55
26

The easiest way to do this is to use the DataFrame.pct_change() method.

Here is a quick example

In[1]: aapl = get_data_yahoo('aapl', start='11/1/2012', end='11/13/2012')

In[2]: appl
Out[2]: 
          Open    High     Low   Close    Volume  Adj Close
Date                                                           
2012-11-01  598.22  603.00  594.17  596.54  12903500     593.83
2012-11-02  595.89  596.95  574.75  576.80  21406200     574.18
2012-11-05  583.52  587.77  577.60  584.62  18897700     581.96
2012-11-06  590.23  590.74  580.09  582.85  13389900     580.20
2012-11-07  573.84  574.54  555.75  558.00  28344600     558.00
2012-11-08  560.63  562.23  535.29  537.75  37719500     537.75
2012-11-09  540.42  554.88  533.72  547.06  33211200     547.06
2012-11-12  554.15  554.50  538.65  542.83  18421500     542.83
2012-11-13  538.91  550.48  536.36  542.90  19033900     542.90

In[3]: aapl.pct_change()
Out[3]:
                Open      High       Low     Close    Volume  Adj Close
Date                                                                   
2012-11-01       NaN       NaN       NaN       NaN       NaN        NaN
2012-11-02 -0.003895 -0.010033 -0.032684 -0.033091  0.658945  -0.033090
2012-11-05 -0.020759 -0.015378  0.004959  0.013558 -0.117186   0.013550
2012-11-06  0.011499  0.005053  0.004311 -0.003028 -0.291453  -0.003024
2012-11-07 -0.027769 -0.027423 -0.041959 -0.042635  1.116864  -0.038263
2012-11-08 -0.023020 -0.021426 -0.036815 -0.036290  0.330747  -0.036290
2012-11-09 -0.036049 -0.013073 -0.002933  0.017313 -0.119522   0.017313
2012-11-12  0.025406 -0.000685  0.009237 -0.007732 -0.445323  -0.007732
2012-11-13 -0.027502 -0.007250 -0.004251  0.000129  0.033244   0.000129
spencerlyon2
  • 9,476
  • 4
  • 30
  • 39
  • I like this solution. But it's specific to my use case. What if I want to calculate the average between every pair of months (or something complicated which is has no built-in pandas function): `(vfiax_monthly.open[1:] - vfiax_monthly.open[:-1])/2` – Daniel Nov 14 '12 at 21:08
  • Could you add an example of 'something complicated' to the original post? Assuming you had a `DateTimeIndex` with regular frequency you could always use `df.resample` to aggregate the data at another regular frequency (like every two months) and then use `df.pct_change()` to get the returns. Also there are various options for `pct_change()` [see `periods`, `freq`] that allow you to specify how many data points should be used to compute the returns (`periods` defaults to 1, which is why the solution gave the same answer as your function). – spencerlyon2 Nov 14 '12 at 21:49
  • `(vfiax_monthly.open[1:] + vfiax_monthly.open[:-1])/2` was an example although maybe there's some kind of a window mean function. But let's say I need: `(3*vfiax_monthly.open[1:] + 2*vfiax_monthly.open[:-1])/5`. Now I'm realizing that the choice of the result index is arbitrary so maybe the magic function that i'm looking for does not exist. – Daniel Nov 14 '12 at 23:28
  • (vfiax_monthly.open[1:] - vfiax_monthly.open[:-1])/vfiax_monthly.open[1:] is not the percentage change but the return on investment. Percentage change is the current row divided by the previous one, which would be the equivalent of vfiax_monthly.open[1:]/vfiax_monthly.open[:-1]-1 so in any case pct_change() would be wrong. – Matti John Nov 15 '12 at 00:53
8

The best way to calculate forward looking returns without any chance of bias is to use the built in function pd.DataFrame.pct_change(). In your case all you need to use is this function since you have monthly data, and you are looking for the monthly return.

If, for example, you wanted to look at the 6 month return, you would just set the param df.pct_change(periods = 6) and that will give you the 6 month percent return.

Because you have a relatively small data set, the easiest way is to resample on the parameters that you need to calculate the data on then use the pct_change() function again.

However because of the nice properties of log it is common to use the formula for calculating returns (if you plan on computing statistics on the return series):

enter image description here

Which you would implement as such:

log_return = np.log(vfiax_monthly.open / vfiax_monthly.open.shift())

rgalbo
  • 4,186
  • 1
  • 19
  • 29
1

Could also use a mix of diff and shift methods of pandas series:

retrun = vfiax_monthly.open.diff()/vfiax_monthly.open.shift(1)
mallet
  • 2,454
  • 3
  • 37
  • 64