1

I am trying to calculate the confidence interval in R. Due to some special reasons, I have to do it with the functions in "bootstrap" package.(which means I can't use the functions in "boot" package.)

Here is my code.

And what I am doing is trying to calculate the Pearson correlation coefficient, and then apply the Bootstrap method (with B = 100) to obtain the estimate of the correlation coefficient. But I don't know how to construct the 95% confidence intervals.

library(bootstrap) 
data('law')

set.seed(1)
theta <- function(ind) {
  cor(law[ind, 1], law[ind, 2], method = "pearson")
  }
law.boot <- bootstrap(1:15, 100, theta) 
# sd(law$thetastar)
percent.95 <- function(x) {
  quantile(x,  .95)
  }
law.percent.95 <- bootstrap(1:15, 100, theta, func=percent.95)

Sorry if I didn't make myself clear or tag the wrong tags. Sorry twice for not producing a dataset (now it's provided) and thank professor Roland for point it out. Thanks very much!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Lucida
  • 13
  • 6
  • 2
    "which is a matrix that has 2 lists." That is a very unusual data structure. Therefore, you need to provide a reproducible example (see this [FAQ](https://stackoverflow.com/a/5963610/1412059)) or at least the output of `str(CD)`. – Roland Nov 29 '18 at 07:33

2 Answers2

2

Usually, after bootstrapping we use the 2.5% and 97.5% percentiles as a 95% confidence interval (because we subtract α/2=.025 from each side). See also @thothal's answer and the comments under the answers.

R <- 1e5 - 1  ## number of bootstrap replications
est <- with(law, cor(lsat, gpa))  ## naïve correlation

theta <- function(ind) cor(law[ind, 1], law[ind, 2], method="pearson")
set.seed(1)
B1 <- bootstrap::bootstrap(seq(nrow(law)), R, theta) 
(ci1 <- c(estimate=est, quantile(B1$thetastar, c(.025, .975))))
#  estimate      2.5%     97.5% 
# 0.7763745 0.4594845 0.9620884 

Here an alternative approach from scratch:

theta2 <- function(x) with(x, cor(lsat, gpa))
set.seed(1)
B2 <- replicate(R, theta2(law[sample(nrow(law), nrow(law), replace=TRUE), ]))
(ci2 <- c(estimate=est, quantile(B2, c(.025, .975))))
#  estimate      2.5%     97.5% 
# 0.7763745 0.4607644 0.9617970 

And finally an approach using the boot package which has a boot.ci function:

theta3 <- function(data, k) cor(data[k, ])[1,2]
set.seed(1)
B3 <- boot::boot(law, theta3, R=R)
(ci3 <- c(est, boot::boot.ci(B3, type='perc')$percent[4:5]))
# [1] 0.7763745 0.4593727 0.9620923
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thank you for correcting my code! ! But I still don't know how to construct the confidence interval according to the output table. Can you please tell me which values I should use to get that CI?(to output a interval you know) Thanks again! – Lucida Nov 29 '18 at 08:18
  • 1
    Thank you so much!! I don't know if is appropriate to use the Normal Distribution here, but it kind of make sense to the data I'm now working with. I will discuss this with my professor later. Again thanks so much. You are so professional and your answer helps a lot!! – Lucida Nov 29 '18 at 13:10
1

There are different ways of calculating the CI for the bootstrap estimator (cf. to this Wikipedia article for instance.

The easiest is to tale the 2.5% and 97.5% quantiles from the bootstrapped coefficients (Percentile Bootstrap in the Wikipedia article):

quantile(law.boot$thetastar, c(0.025, 0.975))
#      2.5%     97.5% 
# 0.4528745 0.9454483 

Basic Bootstrap would be calculated as

2 * mean(law.boot$thetastar) - quantile(law.boot$thetastar, c(0.975, 0.025))
#     97.5%      2.5% 
# 0.5567887 1.0493625
thothal
  • 16,690
  • 3
  • 36
  • 71
  • Thank you very much! You're right that this is exactly the easiest way to get the CI and your advice(to check on the wiki article) is very helpful! But there is something wrong when I ran your codes in Rstudio. The CI I got was a little different from yours. (I get [0.4606497,0.9622643 ] and [-0.1914946,0.3101201 ]. Anyway, thank you very much! Your answer is very inspiring! Thanks! – Lucida Nov 29 '18 at 13:22
  • 1
    Well, since the bootstrap estimators depend on the samples drawn at random, I am not surprised to see different results ;) – thothal Nov 29 '18 at 16:13
  • Notice that the “basic bootstrap CI” includes impossible values. I’m giving this answer my upvote because it is more true to the spirit of the bootstrap method and theoretic justification. – IRTFM Nov 29 '18 at 17:45
  • Oh I got it! I think I've kind of complicated the question before. Thank you soso much!!! – Lucida Nov 30 '18 at 13:27