2

I have a logistic regression model that I am using to predict size at maturity for king crab, but I am having trouble setting up the code for bootstrapping using the boot package. This is what I have:

#FEMALE GKC SAM#
LowerChatham<-read.table(file=file.choose(),header=TRUE)

#LOGISTIC REGRESSION FIT#
glm.out<-glm(Mature~CL,family=binomial(link=logit),data=LowerChatham)
plot(Mature~CL,data=LowerChatham)
lines(LowerChatham$CL,glm.out$fitted,col="red")
title(main="Lower Chatham")
summary(glm.out)
segments(98.9,0,98.9,0.5,col=1,lty=3,lwd=3)

SAM<-data.frame(CL=98.97)
predict(glm.out,SAM,type="response")

I would like to to bootstrap the statistic CL=98.97 since I am interested in the size at which 50% of crab are mature, but I have no idea how to setup my function to specify the that statistic and let alone the bootstrap function in general to get my 95% C.I. Any help would be greatly appreciated! Thanks!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
user2544557
  • 23
  • 1
  • 6
  • You need the `boot` package. (It is very unclear what you mean by "bootstrap the statistic CL=98.97", so you may need to do further homework or flag the question for transfer to CrossValidated.com). Remember ... we cannot see your screen. – IRTFM Jul 03 '13 at 03:00
  • 3
    Try writing a reproducible example. Check out this how to: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – marbel Jul 03 '13 at 03:15
  • Yes, I am trying to use the boot package to bootstrap my logistic regression model to get confidence intervals for CL=98.97 to determine how confident I am in my estimate of deciding if a crab at that size is mature or not. But I am having trouble understanding how to write the function so it always returns CL=98.97. I also don't understand what the indices are referring to in the bootpackage. – user2544557 Jul 10 '13 at 17:02

1 Answers1

3

In each bootstrap iteration, you want to do something like

range <- 1:100 # this could be any substantively meaningful range
p <- predict(glm.out, newdata = data.frame(CL=range), "response")
range[match(TRUE,p>.5)] # predicted probability of 50% maturity

where you specify a range of values of CL to whatever precision you need. Then calculate the predicted probability of maturity at each of those levels. Then find the threshold value in the range where the predicted probability cross 0.5. This is the statistic it sounds like you want to bootstrap.

You also don't need the boot to do this. If you define a function that samples and outputs that statistic as its result, you can just do replicate(1000, myfun) to get your bootstrap distribution, as follows:

myfun <- function(){
    srows <- sample(1:nrow(LowerChatham),nrow(LowerChatham),TRUE)
    glm.out <- (Mature ~ CL, family=binomial(link=logit), data=LowerChatham[srows,])
    range <- 1:100 # this could be any substantively meaningful range
    p <- predict(glm.out, newdata = data.frame(CL=range), "response")
    return(range[match(TRUE,p>.5)]) # predicted probability of 50% maturity
}
bootdist <- replicate(1000, myfun()) # your distribution
quantile(unlist(bootdist),c(.025,.975)) # 95% CI
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Awesome! thank you very much! I am still having issues with defining a function that outputs that statistic using the boot package. Do I just change my model to glm.out<-(Mature~CL=99,family=binomial(link=logit),data=LowerChatham)??? or do I need to define a data frame and somehow insert it into the function. My R skills are not that good so thanks for your patience in advance with this. – user2544557 Jul 10 '13 at 17:08
  • Right on everything works except the last line of code. quantile(bootdist,c(.025,.975)) When I run this line I get the following error and I have no idea how to fix this. Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic – user2544557 Jul 10 '13 at 21:01
  • Hmmm...I get the same error message. This is what I put in: bootdist <- replicate(1000, myfun) # your distribution quantile(unlist(bootdist),c(.025,.975))) # 95% CI – user2544557 Jul 10 '13 at 21:08
  • What does `bootdist` look like (before you run `quantile`)? – Thomas Jul 10 '13 at 21:13
  • This is everything I put in: library(boot) myfun <- function(){ srows <-sample(1:nrow(LowerChatham),nrow(LowerChatham),TRUE) glm.out <-glm(Mature~CL,family=binomial(link=logit),data=LowerChatham[srows,]) range <- 1:200 # this could be any substantively meaningful range p <- predict(glm.out, newdata = data.frame(CL=range), "response") return(range[match(TRUE,p>.5)]) # predicted probability of 50% maturity } bootdist <- replicate(1000, myfun) # your distribution quantile(unlist(bootdist),c(.025,.975))) # 95% CI The results in bootdist are pretty long – user2544557 Jul 10 '13 at 21:15
  • I meant is `bootdist` a list (and is it just a list of single-element vectors)? Or, is it something else? – Thomas Jul 10 '13 at 21:25
  • oh sorry [bootdist] is a list of each replication – user2544557 Jul 10 '13 at 21:28
  • > quantile(unlist(bootdist),c(.025,.975)) # 95% CI Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic – user2544557 Jul 10 '13 at 21:36
  • Awesome it worked!! Thank you very very much for your patience with me and working on this. – user2544557 Jul 10 '13 at 21:50