21

I have a time series "Ser" and I want to compute volatilities (standard deviations) with a rolling window. My current code correctly does it in this form:

w = 10
for timestep in range(length):
    subSer = Ser[timestep:timestep + w]
    mean_i = np.mean(subSer)
    vol_i = (np.sum((subSer - mean_i)**2) / len(subSer))**0.5
    volList.append(w_i)

This seems to me very inefficient. Does Pandas have built-in functionality for doing something like this?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Thegamer23
  • 537
  • 1
  • 8
  • 20
  • Improvement of working code should be in CodeReview.StackExchange.com – Prune Apr 07 '17 at 18:01
  • 2
    @Prune. Given my answer below, I think this question qualifies for SO. OP is really asking if there is a built-in method for doing a sliding window. The code above is really there just to demonstrate effort. – Mad Physicist Apr 07 '17 at 18:02
  • No problem. I didn't convince myself hard enough to slap it with a closure flag. – Prune Apr 07 '17 at 18:09
  • 1
    @Prune it's really ends up being about pandas usage . I added the tag, maybe person answering can clean up the title. – pvg Apr 07 '17 at 18:13
  • I've fixed up the title and the wording to be pretty unambiguously on-topic for SO. Hope you like it. – Mad Physicist Apr 07 '17 at 18:17
  • Are we positive the OP is using pandas? Only numpy is mentioned in the code. While I think that's definitely the right way to go, I think that editing the question to ask about the use of a library which as near as I can tell the OP didn't use is a little.. off. And I'll continue to believe that even if turns out the guess was right. – DSM Apr 07 '17 at 18:19
  • @DSM. If you really believe that, you should vote to close the question because it is very unclear at best at that point. – Mad Physicist Apr 07 '17 at 18:22
  • Actually it was a great idea. When the question does not seem clear sometimes is because OP does not know how to move and more expert users can help in finding the most efficient way – Thegamer23 Apr 09 '17 at 19:08
  • May I ask how you ended up calculating the volatility? Do you have some example of your final code calculating this? – kramer65 Aug 24 '17 at 02:54

4 Answers4

27

Typically, [finance-type] people quote volatility in annualized terms of percent changes in price.

Assuming you have daily prices in a dataframe df and there are 252 trading days in a year, something like the following is probably what you want:

df.pct_change().rolling(window_size).std()*(252**0.5)

aaron
  • 541
  • 1
  • 4
  • 8
  • 1
    Why are we supposed to square root the number of trading days? – zipline86 Jun 13 '20 at 11:42
  • @zipline86. Probably because the standard deviation is a square root. You're normalizing the average under the square root by the total number of days. – Mad Physicist May 19 '21 at 14:46
  • The square root comes from the fact that expected movements do not scale linearly with number of days. We have the expected moves per day (pct change). You took the 'std' of that. So what you have is a 'per day' std value. If you want to transform it into expected move for a whole year you multiply it by the square root of the number of days. Why exactly it is square root, I cannot explain. But it is easy to see intuitively that multiplying it by the number of days would give you too big a number. (It would imply the stock moves in the same direction every single day. – Kris Jun 06 '21 at 15:08
  • Is the reason of annulized volatility? https://www.wallstreetmojo.com/volatility-formula/ – J.D Jun 30 '21 at 05:35
  • Well, not entirely correct. Volatility is quoted in annualized terms for option pricing for example. For VaR, value of risk calculations, it should be assumed daily. So, in essence, [finance-type] people know that each instrument has its own annoying peculiarities. Thankfully, once you know that, the conversion is simple: `vol_year = vol_day * sqrt(252)` It is assumed 252 trading days a year. Conventions. – jfaleiro Nov 11 '21 at 19:11
  • @aaron - why did you choose not to use `ddof = 0` for the standard deviation calculation, like in the answer above? Thanks – Rocky the Owl Feb 01 '22 at 12:05
24

It looks like you are looking for Series.rolling. You can apply the std calculations to the resulting object:

roller = Ser.rolling(w)
volList = roller.std(ddof=0)

If you don't plan on using the rolling window object again, you can write a one-liner:

volList = Ser.rolling(w).std(ddof=0)

Keep in mind that ddof=0 is necessary in this case because the normalization of the standard deviation is by len(Ser)-ddof, and that ddof defaults to 1 in pandas.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Man, I wished I know about `ddof` before I spent four hours looking bugs! :) – Jonas Byström Sep 04 '18 at 21:06
  • 1
    To me it's not obvious what ddof does by reading the documentation; perhaps those versed in maths on a higher level know the term "Delta Degrees of Freedom." Only testing code gets me proper understanding. Also, reading *all* the documentation is usually an absolute waste of time if you want to get things done. – Jonas Byström Sep 13 '19 at 08:14
  • @Mad Physicist - does this calculation yield the DAILY volatility (because we have note multiplied by the sqrt(252) to annualize)? Thanks – Rocky the Owl Feb 01 '22 at 12:07
  • 1
    @RockytheOwl. It's the 10-day volatility, based on w=10 – Mad Physicist Feb 01 '22 at 12:35
  • Many thanks for the reply @MadPhysicist! Okay, I suppose that makes sense. What happens if we multiply it by sqrt(252) though? Then it becomes the 10-day volatility annualized to a year? Apologies, have been searching around and keep getting confused. Thanks – Rocky the Owl Feb 01 '22 at 12:54
  • @RockytheOwl. Are you assuming gaussian random walk? – Mad Physicist Feb 01 '22 at 12:55
  • @MadPhysicist - I think so...? I was just working with a dataframe of stock closing price values and was told to 'calculate the volatility'. This standard deviation of the log returns formula is the one I want to use (so I think this assumes gaussian random walk, but correct me if wrong). A search on the implementation led me to your answer. Just getting confused about this annualization to a year for an arbitrary sized window. That is we just take any (sensible) window size and get standard deviation of returns, but now just wondering what it means if I multiply by sqrt(252) factor? – Rocky the Owl Feb 01 '22 at 14:13
  • @RockytheOwl. Assuming a Gaussian random walk, you can say that the variance will increase linearly, so after one year, the variance will be 25.2 times larger than for a 10 day interval. The volatility will therefore be ~sqrt(25.2) times larger. – Mad Physicist Feb 01 '22 at 15:18
  • I just want to add this. If you have a small sample and you try to estimate the true volatility of a big population, then you divide std dev with "N-1", just like normal. But if you have all necessary historical data, and you try to calculate the true historical volatility, then you divide std dev with "N-0" instead. So if you want to divide std dev with "N-0", you use ".std(ddof=0)". If you want to divide std dev with "N-1", you use ".std(ddof=1)", or this instead ".std()" which is equivalent – Orvar Korvar May 11 '22 at 21:04
  • @MadPhysicist I dont understand why you annualize by multiplying with sqrt(252)? mcguip below, multiplies with sqrt(252/21). So if I want to annualize a rolling volatility of 21 days, should I multiply with sqrt (252/21) or with sqrt(252)? – Orvar Korvar May 12 '22 at 22:18
  • @OrvarKorvar. If you assume a Gaussian random walk, variance increases linearly. If you want to take a 1-day variance, it will be 1/252 or so of the annual variance. If you have a 21-day variance, it will be 21/252 or so of the annual variance. Standard deviation is the square root of variance. That should be all the information you need to decide which factor to use. – Mad Physicist May 12 '22 at 23:32
  • @MadPhysicist just to clarify, assuming that the timeseries frequencies are daily, then by using w=10 you are calculating daily std using a 10-days sample and therefore if you want to annualize you need to multiply by sqrt(252) and not by sqrt(25.2) – MaPy Jan 08 '23 at 10:14
17

"Volatility" is ambiguous even in a financial sense. The most commonly referenced type of volatility is realized volatility which is the square root of realized variance. The key differences from the standard deviation of returns are:

  • Log returns (not simple returns) are used
  • The figure is annualized (usually assuming between 252 and 260 trading days per year)
  • In the case Variance Swaps, log returns are not demeaned

There are a variety of methods for computing realized volatility; however, I have implemented the two most common below:

import numpy as np

window = 21  # trading days in rolling window
dpy = 252  # trading days per year
ann_factor = days_per_year / window

df['log_rtn'] = np.log(df['price']).diff()

# Var Swap (returns are not demeaned)
df['real_var'] = np.square(df['log_rtn']).rolling(window).sum() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])

# Classical (returns are demeaned, dof=1)
df['real_var'] = df['log_rtn'].rolling(window).var() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])
mcguip
  • 5,947
  • 5
  • 25
  • 32
  • np.log(df['price']).diff() : this won't work for negative returns as log(0) = inf and log(x < 0) is undefined. Or am I missing something? You have to do log (p1 / p0), which can be approximated to ln(1 + r) if r is small. https://www.investopedia.com/articles/investing/102715/computing-historical-volatility-excel.asp – Carl Jan 26 '21 at 11:24
  • 1
    I think what you mean is that this will not work for negative prices, not negative returns. Negative prices (or interest rates for that matter) require a different assumption on the underlying process, specifically normal vol https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2687742 – mcguip Jan 27 '21 at 12:39
  • No, I mean it will not work for negative returns. Log (x < 0) is undefined. As the link I posted describes, you must do log (p1 / p0) which is ~log(1 + r) as r tends to zero. – Carl Jan 27 '21 at 21:45
  • `var()` default is `ddof=0`, so you could probably just replace the first example with `var()` and second with `var(ddof=1)` – Baczek May 30 '21 at 20:14
  • 1
    Upvoted, feels like the best / most correct and complete answer here. – Kris Jun 06 '21 at 15:20
  • 1
    @carl I think this does work for negative returns. The formula is not taking the log of the difference but the difference of the log (of the price). So the formula works fine if prices are positive. If prices can go negative intuitively using log returns isn't a good idea anyway since the intuition behind using it is because you assume prices can not go negative so the returns get smaller as you approach 0). For instruments that can have negative prices, this intuition is probably wrong and you shouldn't use log returns. – Kris Jun 06 '21 at 15:25
  • Right you are ... somehow I missed the bracket! – Carl Jun 07 '21 at 16:40
  • I dont understand why you have this ann_factor? Your ann_factor counts how many 21 day windows there are in a year. "Mad Physicist" above does not have ann_factor in the same way, he just use sqrt(252). But you use sqrt(252/21). So what is the difference? If I calculate the vol of 21 days, and want to annualize it, should I multiply with sqrt(252/21) or with sqrt(252)? @mcguip – Orvar Korvar May 12 '22 at 22:15
  • The reason @Mad Physicist annualizes by 252 is because the standard deviation computed is daily. – mcguip Jan 16 '23 at 09:03
7

Here's one NumPy approach -

# From http://stackoverflow.com/a/14314054/3293881 by @Jaime
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

# From http://stackoverflow.com/a/40085052/3293881
def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def rolling_meansqdiff_numpy(a, w):
    A = strided_app(a, w)
    B = moving_average(a,w)
    subs = A-B[:,None]
    sums = np.einsum('ij,ij->i',subs,subs)
    return (sums/w)**0.5

Sample run -

In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))

In [203]: rolling_meansqdiff_loopy(Ser, w=10)
Out[203]: 
[2.6095976701399777,
 2.3000000000000003,
 2.118962010041709,
 2.022374841615669,
 1.746424919657298,
 1.7916472867168918,
 1.3000000000000003,
 1.7776388834631178,
 1.6852299546352716,
 1.6881943016134133,
 1.7578395831246945]

In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
Out[204]: 
array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
        1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
        1.75783958])

Runtime test

Loopy approach -

def rolling_meansqdiff_loopy(Ser, w):
    length = Ser.shape[0]- w + 1
    volList= []
    for timestep in range(length):
        subSer=Ser[timestep:timestep+w]
        mean_i=np.mean(subSer)
        vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
        volList.append(vol_i)
    return volList

Timings -

In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))

In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
1 loops, best of 3: 2.63 s per loop

# @Mad Physicist's vectorized soln
In [225]: %timeit Ser.rolling(10).std(ddof=0)
1000 loops, best of 3: 380 µs per loop

In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
1000 loops, best of 3: 393 µs per loop

A speedup of close to 7000x there with the two vectorized approaches over the loopy one!

Divakar
  • 218,885
  • 19
  • 262
  • 358