7

I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For example, 95% VaR is the 5th percentile figure in that time series.

I have my time series in a Pandas dataframe, and am currently using the pd.quantile() function to compute the percentile. My question is, typical market convention for VaR is use an exclusionary percentile (ie: 95% VaR is interpreted as: there is a 95% chance your portfolio will not loose MORE than the computed number) - akin to how MS Excel PERECENTILE.EXC() works. Pandas quantile() works akin to how Excel's PERCENTILE.INC() works - it includes the specified percentile. I have scoured several python math packages as well as this forum for a python solution that uses the same methodology as PERCENTILE.EXC() in Excel with no luck. I was hoping someone here might have a suggestion?

Here is sample code.

import pandas as pd
import numpy as np

test_pd = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
test_np = np.array([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])

print 'pandas: ' + str(test_pd.quantile(.05))
print 'numpy: '+ str(np.percentile(test_np,5))

The answer i am looking for is -77.4

Thanks,

Ryan

B--rian
  • 5,578
  • 10
  • 38
  • 89
ryanr377
  • 81
  • 1
  • 5
  • 1
    Is it possible to provide a small example of the input data you are working with, the output that you are currently getting with `pd.quantile` and an example of the output you would like to get? It sounds like `numpy.percentile` would do what you are looking for, but I would be hesitant to post an answer without knowing what the data looks like. – johnchase Jul 26 '16 at 17:41
  • Sure - I have actually tried that, both numpy percentile and pandas quantile use the same methodology - see new example added to original post. In Excel, this array returns -72 via PERCENTILE.INC and -77.4 using PERCENTILE.EXC. I need a Python solution that gets me to -77.4 – ryanr377 Jul 26 '16 at 18:09

3 Answers3

5

It won't be as efficient as Pandas' own percentile but it should work:

def quantile_exc(ser, q):
    ser_sorted = ser.sort_values()
    rank = q * (len(ser) + 1) - 1
    assert rank > 0, 'quantile is too small'
    rank_l = int(rank)
    return ser_sorted.iat[rank_l] + (ser_sorted.iat[rank_l + 1] - 
                                     ser_sorted.iat[rank_l]) * (rank - rank_l)

ser = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])

quantile_exc(ser, 0.05)
Out: -77.400000000000006

quantile_exc(ser, 0.1)
Out: -68.399999999999991

quantile_exc(ser, 0.3)
Out: -2.0

Note that Excel fails for small percentiles; it is not a bug. It is because ranks that go below the minimum value is not suitable for interpolation. So you might want to check if rank > 0 in the quantile_exc function (see the assertion part).

ayhan
  • 70,170
  • 20
  • 182
  • 203
  • That works exactly as expected! I tested against Excel using over 1,000 different full length time series (504 observations each) and it matches at both the 0.05 and 0.01. Thank you very much. Do you know if this is the exact methodology built into Excel? – ryanr377 Jul 26 '16 at 19:55
  • Other than some value checking, errors etc., it should be the same by definition (for example this one does not check if 0 – ayhan Jul 26 '16 at 19:57
  • 2
    Matching Excel's formula has been making me nuts for a week! This seriously helped. Thank you. – mauve Oct 04 '18 at 20:16
2

EDIT: I just saw your edit. I think you are making a mistake. The value -77.4 is actually the 99.5% percentile of your data. Try test_pd.quantile(.005). I believe that you must have made a mistake in Excel when specifying your percentile.

EDIT 2: I just tested it myself in Excel. For the 50-th percentile, I am getting the correct value in both Excel and Numpy/Pandas. For the 5th percentile however, I am getting -72 in Pandas/Numpy, and -74.6 in Excel. But Excel is just wrong here: it is very obvious that -74.6 is the 0.5th percentile, not the 5th...

FINAL EDIT: After some testing, it seems like Excel is behaving erratically around very small values of k with the PERCENTILE.EXC() function. Indeed, using the function with any k < 0.05 returns an error, so 0.05 must be a threshold below which the function is not working properly. I do not know why Excel chooses to return the 0.5th percentile when asked to exclude the 5th percentile (the logical behavior would be to return the 4.9th percentile, or the 4.99th...). However, both Numpy, Pandas and Excel return the same values for other values of k. For instance, PERCENTILE.EXC(0.5) = 6, and test_pd.quantile(0.5) = 6 as well. I guess the lesson is that we need to be wary of Excel's behavior ;).

The way I understand your problem is: you want to know the value that corresponds to the k-th percentile of your data, this k-th percentile excluded. However, pd.quantile() returns the value that corresponds to your k-th percentile, this k-th percentile included.

I do not think that pd.quantile() returning the k-th percentile included is an issue. Indeed, assuming you want all stocks having a Value at Risk strictly above the 5-th percentile, you would do:

mask = data["VaR"] < pd.quantile(data["VaR"], 0.05)
data_filt = data[mask]

Because you used a "smaller than" ( < ) operator, the values which exactly correspond to your 5-th percentile will be excluded, similar to Excel's PERCENTILE.EXC() function.

Do tell me if this is what you were looking for.

Andreq
  • 407
  • 4
  • 14
  • -77.4 is indeed the result of PERCENTILE.EXC(array,0.05) in Excel, which also happens to be the same result as PERCENTILE.INC(array,0.005). Looking at your example now. – ryanr377 Jul 26 '16 at 18:31
  • Yes, I just tested it. I was going over the Excel documentation: it is not clear at all what they mean by "Exclusive" or "Inclusive". In other words, what is the threshhold that they are using? Is EXC(0.05) equal to INC(0.049), or to which number? I'm sorry, I'm a bit at loss with Excel's behavior here. FINAL EDIT: Excel returns an error when inputing a number below 0.05, which means that the EXC() function must be buggy at very small values. My hunch is that it consider 0.005 to the "next" percentile to 0.05, which is why it returns the same value as pd.quantile(0.005). – Andreq Jul 26 '16 at 18:34
  • Agreed - I have no idea how it works. I used a solver function with PERCENTILE.INC, and in this particular example it used 0.00499939999999992, but this number seems change as the array length changes though. I was hoping it was a consistent percentage of the supplied percentile. – ryanr377 Jul 26 '16 at 18:41
  • Very weird indeed, clever approach with the solver. If Microsoft has a bug bounty program, you might want to submit this bug! I shiver at the number of P&L or stock reports which might have been affected by this weird issue ;). – Andreq Jul 26 '16 at 18:45
  • I think the limitation with Excel's .EXC version is not with small values ok k, but with short arrays. I am using PERCENTILE.EXC(array,0.01) with longer arrays without error - i guess I need to find more information (if possible) on how Excel determines the cut-off to exclude. – ryanr377 Jul 26 '16 at 18:50
  • Haha - very true. Thanks for your help. – ryanr377 Jul 26 '16 at 18:51
0

A few years after the original question was asked, however I had to build a pandas script to replicate/simplify/check complex array formulas.... here are my thoughts

Say you wanted the 90th percentile

PERCENTILE.EXC is the 0.9(n+1) th value
PERCENTILE.INC is the 0.9(n-1)+1 th value

PERCENTILE.EXC will work with 9 or more entries. By default Pandas will use the PERCENTILE.INC equivalent i.e. df.quantile(0.9).

If you want to replicate both function use numpy

np.percentile(x, 90,method='linear') ---> PERCENTILE.INC
np.percentile(x, 90,method='weibull') ---> PERCENTILE.EXC

Change the number to suit your needs.

Regards

sxs
  • 29
  • 2