Approximate the distribution of a sum of binomial random variables in R

Question

My goal is approximate the distribution of a sum of binomial variables. I use the following paper The Distribution of a Sum of Binomial Random Variables by Ken Butler and Michael Stephens.

I want to write an R script to find Pearson approximation to the sum of binomials. There is an R-package PearsonDS that allows do this in a simple way.

So I take the first example from the paper and try to find density of the Pearson distribution for this case. Finally i get an error message "There are no probability distributions with these moments".

Could you please explain me what's wrong in the below code?

library(PearsonDS)

# define parameters for five binomial random varibles

n<-rep(5,5)
p<-seq(0.02,0.10,0.02)

# find the first four cumulants

k.1<-sum(n*p)
k.2<-sum(n*p*(1-p))
k.3<-sum(n*p*(1-p)*(1-2*p))
k.4<-sum(n*p*(1-p)*(1-6*p*(1-p)))

# find the skewness and kurtosis parameters

beta.1<-k.3^2/k.2^3
beta.2<-k.4/k.2^2

# define the moments and calculate

moments <- c(mean=k.1,variance=k.2,skewness=sqrt(beta.1),kurtosis=beta.2)
dpearson(1:7,moments=moments)

I get the error message "There are no probability distributions with these moments".

It's not clear to me from the documentation how `dpearson` determines the distribution type when you override with `moments` . Try using The desired `dpearson[I thru VII]` function directly if you know which distribution you want to use. Alternatively, make sure all your `moments` arguments are legal, e.g. variance>0 and real. — Carl Witthoft, Apr 10 '13 at 15:45
@Carl, thanks for the response. I've read more carefully the documentation. — Eugeny Chankov, Apr 10 '13 at 20:07
The function `pearsonFitM` determines the distribution type. The message "There are no probability distributions with these moments" is displayed when the kurtosis minus 1 is less than the skewness. Unfortunately, I don't know and can not find the causes of this condition. If i drop it, then the cumulants define the PearsonI distribution. — Eugeny Chankov, Apr 10 '13 at 20:38
sorry, i made a typo. The correct form of the invalid condition should be _the kurtosis minus 1 is less than the square of the skewness_ — Eugeny Chankov, Apr 11 '13 at 05:14

KenHBS · Answer 1 · 2017-01-03T19:52:43.563

What you try to insert as kurtosis in your moments, is actually the excess kurtosis, which is just kurtosis - 3. From the help-page of dpearson():

moments:
optional vector/list of mean, variance, skewness, kurtosis (not excess kurtosis).

So adding 3 to beta.2 will provide you with the real kurtosis:

beta.1 <- (k.3^2)/(k.2^3)
beta.2 <- k.4/(k.2^2)
kurt <- beta.2 + 3

moments <- c(mean = k.1, variance = k.2, skewness = beta.1, kurtosis = kurt)
dpearson(1:7, moments=moments)
# [1] 0.3438773545 0.2788412385 0.1295129534 0.0411140817 0.0099279576
# [6] 0.0019551512 0.0003294087

To get a result like the one in the paper, we should investigate the cumulative distribution function and add 0.5 to correct for the bias caused by approximating a discrete distribution by a continuous one:

ppearson(1:7+0.5, moments = moments)
# [1] 0.5348017 0.8104394 0.9430092 0.9865434 0.9973715 0.9995578 0.9999339

A little background information:

The function threw an error because the relationship between kurtosis and skewness wasn't invalid: kurtosis is lower-bounded by the skewness in the following way: kurtosis >= (skewness)^2 - 1. The proof ain't pretty and is certainly beyond the scope of the question, but you can check out the references below if you like for different versions of this inequality.

Wilkins, J. Ernest. A Note on Skewness and Kurtosis. Ann. Math. Statist. 15 (1944), no. 3, 333--335. http://projecteuclid.org/euclid.aoms/1177731243.
K. Pearson. Mathematical contributions to the theory of evolution, XIX; second supplement to a memoir on skew variation. Philos. Trans. Roy. Soc. London Ser. A, 216 (1916), p. 432 http://rsta.royalsocietypublishing.org/content/216/538-548/429
Pearson, K. (1929). "Editorial note to 'Inequalities for moments of frequency functions and for various statistical constants'". Biometrika. 21 (1–4): 361–375. link

The numbers don't seem to match up with the numbers for the Pearson approximation in table 3. Could you comment on that? — bdeonovic, Jul 05 '17 at 15:07

Approximate the distribution of a sum of binomial random variables in R

1 Answers1

A little background information: