1

I'm trying rnbinom as below

x<- rnbinom(500, mu = 4, size = .1)
xtrunc <- x[x>0]

then I just get 125 observations.

However, I want to make 500 observations excluding 0 (zero) with same condition (mu = 4, size =.1).

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
lee
  • 23
  • 4

1 Answers1

1

This does the job:

N <- 500    ## target number of samples

## set seed for reproducibility
set.seed(0)
## first try
x <- rnbinom(N, mu = 4, size = .1)
p_accept <- mean(success <- x > 0)  ## estimated probability of accepting samples
xtrunc <- x[success]
## estimated further sampling times
n_further <- (N - length(xtrunc)) / p_accept
## further sampling
alpha <- 1.5   ## inflation factor
x_further <- rnbinom(round(alpha * n_further), mu = 4, size = .1)
## filter and combine
xtrunc <- c(xtrunc, (x_further[x_further > 0])[seq_len(N - length(xtrunc))])

## checking
length(xtrunc)
# [1] 500

summary(xtrunc)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   1.00    2.00    5.00   12.99   16.00  131.00 

In above, sampling takes two stages. The result of the initial stage is used to estimate probability of acceptance rate to guide the second stage sampling.

However, since the underlying distribution is explicitly known, the theoretical probability of acceptance rate is know. Therefore, there is no need to perform a two-stage approach in this case. Try:

p <- 1 - pnbinom(0, mu = 4, size = .1)  ## theoretical probability
alpha <- 1.5
n_try <- round(alpha * N / p)
set.seed(0)
x <- rnbinom(n_try, mu = 4, size = .1)
xtrunc <- (x[x > 0])[1:N]

## checking
length(xtrunc)
# [1] 500

summary(xtrunc)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   1.00    2.00    5.00   12.99   16.00  131.00

The idea behind is the theory of geometric distribution. My answer here is closely related. Read the "More efficient vectorized method" section for detailed explanation.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • Your coding is useful. However, I'm solving to make the same distribution with mean and even with variation x<- rnbinom(500, mu = 4, size = .1); x is more than 1 – lee Sep 01 '16 at 01:29