This problem is about how to write a log likelihood function that computes the MLE for binomial distribution. The exact log likelihood function is as following:
Find the MLE estimate by writing a function that calculates the negative log-likelihood and then using nlm()
to minimize it. Find the MLE estimate in this way on your data from part 1.b. Use an initial guess of $p= 0.5$.
My thought process is that, when using the MLE, only terms that involve p (the parameter) matters, so we can ignore the first (n choose x) term.
neg.loglik <- function(p, data){
n <- 1000
return(-sum(data*log(p) + (n-data)*log(1-p)))
# return a single numeric value
}
# nlm(f, p, ...)
nlm(neg.loglik, p = 0.5, data = data)
nlm(neg.loglik, p = 0.5, data = data)$estimate
I am not sure if my neg.loglik function is correct since the estimate seems to be so low... The true p is 1/3, but I am getting 0.013 for my MLE.
DATA:
You have an urn with 30 balls -- 10 are red, 10 are blue, and 10 are green. Write a single line of code to simulate randomly picking 400 balls from the urn with replacement. Create a variable num_green
that records the number of green balls selected in the 400 draws.
set.seed(1)
urn <- c(rep("red", 10), rep("blue", 10), rep("green", 10))
sample <- sample(urn, size = 400, replace = TRUE)
num_green <- length(sample[sample == "green"])
num_green
Now repeat the above experiment 1000 times. Create a vector data
, such that each element in data
is the result (counting the number of green balls) from an independent trial like that described in 1.a.
set.seed(2)
data <- rep(NA,n)
for (i in 1:n){
sample <- sample(urn, size = 400, replace = TRUE)
data[i] <- length(sample[sample == "green"])
}