0

This problem is about how to write a log likelihood function that computes the MLE for binomial distribution. The exact log likelihood function is as following:

enter image description here

Find the MLE estimate by writing a function that calculates the negative log-likelihood and then using nlm() to minimize it. Find the MLE estimate in this way on your data from part 1.b. Use an initial guess of $p= 0.5$.

My thought process is that, when using the MLE, only terms that involve p (the parameter) matters, so we can ignore the first (n choose x) term.

neg.loglik <- function(p, data){

 n <- 1000

  return(-sum(data*log(p) + (n-data)*log(1-p)))

 # return a single numeric value

 }

# nlm(f, p, ...)
nlm(neg.loglik, p = 0.5, data = data)
nlm(neg.loglik, p = 0.5, data = data)$estimate

I am not sure if my neg.loglik function is correct since the estimate seems to be so low... The true p is 1/3, but I am getting 0.013 for my MLE.

DATA:

You have an urn with 30 balls -- 10 are red, 10 are blue, and 10 are green. Write a single line of code to simulate randomly picking 400 balls from the urn with replacement. Create a variable num_green that records the number of green balls selected in the 400 draws.

set.seed(1)

urn <- c(rep("red", 10), rep("blue", 10), rep("green", 10))

sample <- sample(urn, size = 400, replace = TRUE)

num_green <- length(sample[sample == "green"])
num_green

Now repeat the above experiment 1000 times. Create a vector data, such that each element in data is the result (counting the number of green balls) from an independent trial like that described in 1.a.

set.seed(2)

data <- rep(NA,n)

for (i in 1:n){

 sample <- sample(urn, size = 400, replace = TRUE)

 data[i] <- length(sample[sample == "green"])

  }
R Newbie
  • 57
  • 6
  • 1
    What `data` are you using? If I use `data = 300` with your code I get an estimate of `0.2999999`, which seems spot on. Analytically, it's not too hard to show that the MLE of `p` is `successes / trials`, or in your notation `data / n`. – Gregor Thomas Dec 15 '17 at 21:40
  • It would be easier to help you if you provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run and test your code. – MrFlick Dec 15 '17 at 21:43
  • your main problem is confusion between number of trials per binomial sample (400) and number of experimental trials (1000). Try setting `n <- 400` in your function and `n <- 1000` in your last code chunk (or even better, give these two variables different names to reduce confusion) – Ben Bolker Dec 16 '17 at 18:38
  • Your function seems to be missing a term inside the sum (the log of the binomial coefficient) – John Coleman Dec 16 '17 at 19:01

0 Answers0