1

For my thesis I have to fit some glm models with MLEs that R doesn't have, I was going ok for the models with close form but now I have to use de Gausian CDF, so i decide to fit a simple probit model. this is the code:

Data:
set.seed(123)
x <-matrix( rnorm(50,2,4),50,1)
m <- matrix(runif(50,2,4),50,1)
t <- matrix(rpois(50,0.5),50,1)
z <- (1+exp(-((x-mean(x)/sd(x)))))^-1 + runif(50)
y <- ifelse(z < 1.186228, 0, 1)

data1 <- as.data.frame(cbind(y,x,m,t))

myprobit <- function (formula, data) 
{
  mf <- model.frame(formula, data)
  y <- model.response(mf, "numeric")
  X <- model.matrix(formula, data = data)
  if (any(is.na(cbind(y, X)))) 
    stop("Some data are missing.")
  loglik <- function(betas, X, y, sigma) {     #loglikelihood
    p <- length(betas)
    beta <- betas[-p]
    eta <- X %*% beta
    sigma <- 1    #because of identification, sigma must be equal to 1
    G <- pnorm(y, mean = eta,sd=sigma)
    sum( y*log(G) + (1-y)*log(1-G))
  }
  ls.reg <- lm(y ~ X - 1)#starting values using ols, indicating that this model already has a constant
  start <- coef(ls.reg)


  fit <- optim(start, loglik, X = X, y = y, control = list(fnscale = -1), method = "BFGS", hessian = TRUE) #optimizar
  if (fit$convergence > 0) {
    print(fit)
    stop("optim failed to converge!") #verify convergence
  }

  return(fit)
}

myprobit(y ~ x + m + t,data = data1)

And i get: Error in X %*% beta : non-conformable arguments, if i change start <- coef(ls.reg) with start <- c(coef(ls.reg), 1) i get wrong stimatives comparing with:

probit <- glm(y ~ x + m + t,data = data1 , family = binomial(link = "probit"))

What am I doing wrong? Is possible to correctly fit this model using pnorm, if no, what algorithm should I use to approximate de gausian CDF. Thanks!!

  • Usually R people want to see complete (i.e.: executable) examples with sample data. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – U. Windl Mar 05 '19 at 00:34
  • Deleted my "answer" since it clearly wasn't responsive to your code-review type question. But still think it's not clear why ypu are doing it this way. What barriers are there to using the capacity to build new families for `glm`? – IRTFM Mar 05 '19 at 02:03
  • My reseach is about rare events (specially thing connected with credit), so i develop a sistem of equations that when you plugin the distributions they return some desired probabilities (the king correcting is a special case, for instance), but the model must be fiting somehow, for me the natural choise was MLE, so i derive some MLEs and now i am traying to implement them. But first i have to see if what i am doing works im simple models, specially im models where the CDF dont have a close form. – João Vieira Mar 05 '19 at 02:51
  • By the way i am a new R user (i start 3 week ago), i am dividing my time between learning R and fitting models – João Vieira Mar 05 '19 at 02:54

1 Answers1

0

The line of code responsible for your error is the following:

eta <- X %*% beta

Note that "%*%" is the matrix multiplication operator. By reproducing your code I noticed that X is a matrix with 50 rows and 4 columns. Hence, for matrix multiplication to be possible your "beta" needs to have 4 rows. But when you run "betas[-p]" you subset the betas vector by removing its last element, leaving only three elements instead of the four you need for matrix multiplication to be defined. If you remove [-p] the code will work.

Olafant
  • 829
  • 1
  • 7
  • 16
Pedro Fonseca
  • 311
  • 1
  • 3
  • 11