Confidence Interval and Standard error of Skewness and Kurtosis

Question

Please tell me how to calculate skewness and kurtosis along with their respective standard error and confidence interval associated with it(i.e. SE of Skewness and S.E of Kurtosis) I found two packages

1) package:'measure' can only calculate skewness and kurtosis

2) package:'rela' can calcuate both skewness and kurtosis but uses bootstrap by default and no command to turn it off during the calculation.

score 4 · Answer 1 · answered Jan 25 '19 at 16:49

I'm simply copying and pasting the code published by Howard Seltman in here:

# Skewness and kurtosis and their standard errors as implement by SPSS
#
# Reference: pp 451-452 of
# http://support.spss.com/ProductsExt/SPSS/Documentation/Manuals/16.0/SPSS 16.0 Algorithms.pdf
# 
# See also: Suggestion for Using Powerful and Informative Tests of Normality,
# Ralph B. D'Agostino, Albert Belanger, Ralph B. D'Agostino, Jr.,
# The American Statistician, Vol. 44, No. 4 (Nov., 1990), pp. 316-321

spssSkewKurtosis=function(x) {
  w=length(x)
  m1=mean(x)
  m2=sum((x-m1)^2)
  m3=sum((x-m1)^3)
  m4=sum((x-m1)^4)
  s1=sd(x)
  skew=w*m3/(w-1)/(w-2)/s1^3
  sdskew=sqrt( 6*w*(w-1) / ((w-2)*(w+1)*(w+3)) )
  kurtosis=(w*(w+1)*m4 - 3*m2^2*(w-1)) / ((w-1)*(w-2)*(w-3)*s1^4)
  sdkurtosis=sqrt( 4*(w^2-1) * sdskew^2 / ((w-3)*(w+5)) )
  mat=matrix(c(skew,kurtosis, sdskew,sdkurtosis), 2,
        dimnames=list(c("skew","kurtosis"), c("estimate","se")))
  return(mat)
}

To get skewness and kurtosis of a variable along with their standard errors, simply run this function:

x <- rnorm(100)
spssSkewKurtosis(x)

##             estimate    se
##    skew       -0.684 0.241
##    kurtosis    0.273 0.478

Note that the SE formula used assumed x to be normally distributed. — Remi Cuingnet, Dec 12 '19 at 11:04

score 3 · Answer 2 · answered Jun 13 '19 at 12:19

The standard errors are valid for normal distributions, but not for other distributions. To see why, you can run the following code (which uses the spssSkewKurtosis function shown above) to estimate the true confidence level of the interval obtained by taking the kurtosis estimate plus or minus 1.96 standard errors:

set.seed(12345)
Nsim = 10000
Correct = numeric(Nsim)
b1.ols = numeric(Nsim)
b1.alt = numeric(Nsim)
for (i in 1:Nsim) {
 Data = rnorm(1000)  
 Kurt = spssSkewKurtosis(Data)[2,1]
 seKurt =  spssSkewKurtosis(Data)[2,2]
  LowerLimit = Kurt -1.96*seKurt
  UpperLimit = Kurt +1.96*seKurt
  Correct[i] = LowerLimit <= 0 & 0 <= UpperLimit  
 }

TrueConfLevel = mean(Correct)
TrueConfLevel

This gives you 0.9496, acceptably close to the expected 95%, so the standard errors work as expected when the data come from a normal distribution. But if you change Data = rnorm(1000) to Data = runif(1000), then you are assuming that the data come from a uniform distribution, whose theoretical (excess) kurtosis is -1.2. Making the corresponding change from Correct[i] = LowerLimit <= 0 & 0 <= UpperLimit to Correct[i] = LowerLimit <= -1.2 & -1.2 <= UpperLimit gives the result 1.0, meaning that the 95% intervals were always correct, rather than correct for 95% of the samples. Hence, the standard error seems to be overestimated (too large) for the (light-tailed) uniform distribution.

If you change Data = rnorm(1000) to Data = rexp(1000), then you are assuming that the data come from an exponential distribution, whose theoretical (excess) kurtosis is 6.0. Making the corresponding change from Correct[i] = LowerLimit <= 0 & 0 <= UpperLimit to Correct[i] = LowerLimit <= 6.0 & 6.0 <= UpperLimit gives the result 0.1007, meaning that the 95% intervals were correct only for 10.07% of the samples, rather than correct for 95% of the samples. Hence, the standard error seems to be underestimated (too small) for the (heavy-tailed) exponential distribution.

Those standard errors are grossly incorrect for non-normal distributions, as the simulation above shows. Thus, the only use of those standard errors is to compare the estimated kurtosis with the expected theoretical normal value (0.0); e.g., using a test of hypothesis. They cannot be used to construct a confidence interval for the true kurtosis.

score 1 · Answer 3 · answered Aug 26 '20 at 15:18

@HBat is right: provided your sample data is Gaussian, you can compute the standard error using the equation from wikipedia

n = len(sample)
se_skew = ((6*n*(n-1))/((n-2)*(n+1)*(n+3)))**0.5

However, @BigBendRegion is also right: if your data is not Gaussian, this does not work. Then you may need to bootstrap.

R has the DescTools package that can bootstrap confidence intervals for skew (among other things). It can be included in python using rpy2 like so:

""" Import rpy2 and the relevant package"""
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
DescTools = importr('DescTools')
""" You will probably need this if you want to work with numpy arrays"""
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()


def compute_skew(data, confidence_level=0.99):
    """ Compute the skew and confidence interval using rpy2, DescTools
        @param data
        @return dict with keys: skew, skew_ci_lower, skew_ci_upper"""
    d = {}
    d["skew"], d["skew_ci_lower"], d["skew_ci_upper"] = DescTools.Skew(data, conf_level=confidence_level)
    return d

""" Call the function on your data (assuming that is saved in a variable named sample)"""
print(compute_skew(sample))

score 0 · Answer 4 · answered Jul 12 '16 at 12:05

0

try package psych:

> a <- data.frame(cola=rep(c('A','B','C'),100),colb=sample(1:1000,300),colc=rnorm(300))
> describe(a)
      vars   n   mean     sd median trimmed    mad   min    max  range  skew kurtosis    se
cola*    1 300   2.00   0.82   2.00    2.00   1.48  1.00   3.00   2.00  0.00    -1.51  0.05
colb     2 300 511.76 285.59 506.50  514.21 362.50  1.00 999.00 998.00 -0.04    -1.17 16.49
colc     3 300   0.12   1.04   0.05    0.10   1.07 -2.54   2.91   5.45  0.12    -0.24  0.06
> describe(a)$skew
[1]  0.00000000 -0.04418551  0.11857609

answered Jul 12 '16 at 12:05

Zahiro Mor

1,708
1
16
30

1

I think the OP wants the standard error *of the skew and kurtosis themselves* – Ben Bolker Jul 12 '16 at 12:07
@Zahiro Mor I am interested in the SE of the skewness and kurtosis – Jul 12 '16 at 12:21
2

SES and SEK can be calculated directly from the sample size n: where SES = square root of a fraction, 6 n times n minus 1 on top, and n minus 2, times n+1, times n+3 on bottom – Zahiro Mor Jul 12 '16 at 12:28
@ZahiroMor According to the link below they are the rough measure . http://www.real-statistics.com/tests-normality-and-symmetry/analysis-skewness-kurtosis/ – Jul 12 '16 at 12:33
check this out: https://estatistics.eu/what-is-statistics-standard-error-of-skewness-standard-error-of-kurtosis/ – Zahiro Mor Jul 12 '16 at 12:36
2

what you found is a rough estimate (sqrt(6/n) - I suggest using : `ses <- function(n) {sqrt((6*n*(n-1))/((n-2)*(n+1)*(n+3)))}` – Zahiro Mor Jul 12 '16 at 12:36
@ZahiroMor, does this work for non-normal samples as well? This formula can be used when the data comes from a normal distribution but when the distribution is, say, skewed, then I am not sure whether you can use the formula to calculate the standard error for of the estimate of skewness. – Skumin Jul 30 '17 at 10:55

score 0 · Answer 5 · answered Feb 10 '22 at 05:02

def skew_kurt(dataframe: pd.DataFrame) -> pd.DataFrame:

out = []
for col in dataframe:
    
    x = dataframe[col]
    sd = x.std()
    
    if sd == 0:
        out.append({name: np.nan for name in ['skew stat', 'skew se', 'kurt stat', 'kurt se']})
        continue
    
    w, m1 = len(x), x.mean()
    dif = x - m1
    m2, m3, m4 = tuple([(dif ** i).sum() for i in range(2, 5)])
    
    skew = w * m3 / (w - 1) / (w - 2) / sd ** 3
    skew_se = np.sqrt(6 * w * (w - 1) / ((w - 2) * (w + 1) * (w + 3)))
    
    kurt = (w * (w + 1) * m4 - 3 * m2 ** 2 * (w - 1)) / ((w - 1) * (w - 2) * (w - 3) * sd ** 4)
    kurt_se = np.sqrt(4 * (w ** 2 - 1) * skew_se ** 2 / ((w - 3) * (w + 5)))

    out.append({'skew stat': skew, 'skew se': skew_se, 'kurt stat': kurt, 'kurt se': kurt_se})
    
dataframe = pd.DataFrame(out, index = list(dataframe))
dataframe['skew<2'] = np.absolute(dataframe['skew stat']) < 2 * dataframe['skew se']
dataframe['kurt<2'] = np.absolute(dataframe['kurt stat']) < 2 * dataframe['kurt se']
    
return dataframe[['skew stat', 'skew se', 'skew<2', 'kurt stat', 'kurt se', 'kurt<2']]

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Feb 10 '22 at 10:17

Confidence Interval and Standard error of Skewness and Kurtosis

5 Answers5