8

I'm using the function multinom from the nnet package to run a multinomial logistic regression.

In multinomial logistic regression, as I understand it, the coefficients are the changes in the log of the ratio of the probability of a response over the probability of the reference response (i.e., ln(P(i)/P(r))=B1+B2*X... where i is one response category, r is the reference category, and X is some predictor).

However, fitted(multinom(...)) produces estimates for each category, even the reference category r.

EDIT Example:

set.seed(1)
library(nnet)
DF <- data.frame(X = as.numeric(rnorm(30)), 
                 Y = factor(sample(letters[1:5],30, replace=TRUE)))
DF$Y<-relevel(DF$Y, ref="a") #ensure a is the reference category
model <- multinom(Y ~ X, data = DF)
coef(model)
#  (Intercept)           X
#b   0.1756835  0.55915795
#c  -0.2513414 -0.31274745
#d   0.1389806 -0.12257963
#e  -0.4034968  0.06814379

head(fitted(model))
#          a         b          c         d         e
#1 0.2125982 0.2110692 0.18316042 0.2542913 0.1388810
#2 0.2101165 0.1041655 0.26694618 0.2926508 0.1261210
#3 0.2129182 0.2066711 0.18576567 0.2559369 0.1387081
#4 0.1733332 0.4431170 0.08798363 0.1685015 0.1270647
#5 0.2126573 0.2102819 0.18362323 0.2545859 0.1388516
#6 0.1935449 0.3475526 0.11970164 0.2032974 0.1359035

head(DF)
#           X Y
#1 -0.3271010 a

To calculate the predicted probability ratio between response b and response a for row 1, we calculate exp(0.1756835+0.55915795*(-0.3271010))=0.9928084. And I see that this corresponds to the fitted P(b)/P(a) for row 1 (0.2110692/0.2125982=0.9928084).

Is the fitted probability for the reference category calculated algebraically (e.g., 0.2110692/exp(0.1756835+0.55915795*(-0.3271010)))?

Is there a way to obtain the equation for the predicted probability of the reference category?

mlt
  • 1,595
  • 2
  • 21
  • 52
jflournoy
  • 763
  • 1
  • 8
  • 23
  • 1
    I add some data to your question. Can you clarify your question using this example? the expected output? – agstudy Jun 24 '13 at 20:23
  • Beyond the lack of a data example or specification of the package from which fundtion `multinom` might have been extracted, I beleive you have incorrectly specified how to represent odds ratios. Odds are `P(i)/P(!i)`, so odds ratios have 4 probability expressions in ratio form. – IRTFM Jun 24 '13 at 20:51
  • DWin, you are correct with regard to a binomial logistic regression. However I believe that in multinomial logistic regression the predicted odds ratio is actually the ratio between the probability of some response over the probability of a reference response--see the appropriate [Wikipedia page](http://en.wikipedia.org/wiki/Multinomial_logistic_regression). – jflournoy Jun 24 '13 at 20:59
  • Odds are the probability of an event (possibly conditional on other features) divided by the probability of a non-event (conditional on the same features). A Wikipedia citation does not change that fact, especially so when the Wikipedia section on "odds ratios" and "odds" has the correct formulation. – IRTFM Jun 25 '13 at 00:41
  • DWin, I appreciate your correction with regard to my terminology. I should have said 'probability ratio' rather than 'odds'. – jflournoy Jun 25 '13 at 23:53

2 Answers2

8

I had the same question, and after looking around I think the solution is: given 3 classes: a,b,c and the fitted(model) probabilities pa,pb,pc output by the algorithm, you can reconstruct those probabilities from these 3 equations:

log(pb/pa) = beta1*X

log(pc/pa) = beta2*X

pa+pb+pc=1

Where beta1,beta2 are the rows of the output of coef(model), and X is your input data.

Playing with those equations you get to:

pb = exp(beta1*X)/(1+exp(beta1*X)+exp(beta2*X))

pc = exp(beta2*X)/(1+exp(beta1*X)+exp(beta2*X))

pa = 1 - pb - pc
Brant Bobby
  • 14,956
  • 14
  • 78
  • 115
Gabriela
  • 96
  • 1
  • 2
  • I think these equations in the above answer are missing the intercept coefficients, which need to be taken into account. See my answer below for more detail. – Nicholas G Reich Jan 23 '20 at 19:36
3

The key here is that in the help file for multinom() it says that "A log-linear model is fitted, with coefficients zero for the first class."

So that means the predicted values for the reference class can be calculated directly assuming that the coefficients for class "a" are both zero. For example, for the sample row given above, we could calculate the predicted probability for class "a" using the softmax transform: exp(0+0)/(exp(0+0) + exp(0.1756835 + 0.55915795*(-0.3271010)) + exp(-0.2513414 + (-0.31274745)*(-0.3271010)) + exp(0.1389806 + (-0.12257963)*(-0.3271010)) + exp(-0.4034968 + 0.06814379*(-0.3271010)))

or perhaps more simply, using non-hard-coded numbers, we can calculate the entire set of probabilities for the first row of data as:

softMax <- function(x){
    expx <- exp(x)
    return(expx/sum(expx))
}
coefs <- rbind(c(0,0), coef(model))
linear.predictor <- as.vector(coefs%*%c(1,-0.3271010))
softMax(linear.predictor)

FWIW: the example in the original question does not reproduce for me exactly, my seed gives different random deviates. So I have reproduced the example freshly and with my calculations below.

library(nnet)
set.seed(1)
DF <- data.frame(
    X = as.numeric(rnorm(30)), 
    Y = factor(sample(letters[1:5],30, replace=TRUE)))
DF$Y<-relevel(DF$Y, ref="a") #ensure a is the reference category
model <- multinom(Y ~ X, data = DF)
coef(model)
##   (Intercept)             X
## b -0.33646439  1.200191e-05
## c -0.36390688 -1.773889e-01
## d -0.45197598  1.049034e+00
## e -0.01418543  3.076309e-01

DF[1,]
##            X Y
## 1 -0.6264538 c

fitted.values(model)[1,]
##          a          b          c          d          e 
## 0.27518921 0.19656378 0.21372240 0.09076844 0.22375617 

coefs <- rbind(c(0,0), coef(model))
linear.predictor <- as.vector(coefs%*%c(1,DF[1,"X"]))
softMax(linear.predictor)
## [1] 0.27518921 0.19656378 0.21372240 0.09076844 0.22375617
Nicholas G Reich
  • 1,028
  • 10
  • 21