The model is unable to interpret your dishId as the alternative index (alt.var
) because you have different keypairs for different choices. For example, you have "TS" and "RS" as alternative index keys for the first choice in your .csv file but you have "RR" and "RS" as keys for choice 3634. Additionally, you did also not specify the names of the alternatives (alt.levels
). As a result of the fact that alt.levels
is not filled in, mlogit.data
will automatically try to detect the alternatives based upon the alternative index, which it cannot correctly interpret. This is basically where everything goes wrong: The 'food' and 'plate' variables are not interpreted as alternatives but they are considered as individual specific variables that eventually end up causing singularity issues.
You have two options to fix the issue. You can give the actual alternatives as input to mlogit.data
through the alt.levels
parameter:
TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.levels = c("food","plate"),chid.var = "individuals",drop.index=TRUE)
model1 <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)
Alternatively, you could opt to make your index keys consistent so that you can give them as input via alt.var
. mlogit.data
will now be able to correctly guess what your alternatives are:
raw[,3] <- rep(1:2,nrow(raw)/2) # use 1 and 2 as unique alternative keys for all choices
TM <- mlogit.data(raw, choice = "selected", shape = "long", alt.var="dishId", chid.var = "individuals")
model2 <- model <- mlogit(selected ~ food + plate | sex + age +hand, data = TM)
We verify that both models are indeed identical. The results of model 1:
> summary(model1)
Call:
mlogit(formula = selected ~ food + plate | sex + age + hand,
data = TM, method = "nr", print.level = 0)
Frequencies of alternatives:
food plate
0.42847 0.57153
nr method
4 iterations, 0h:0m:0s
g'(-H)^-1g = 0.00423
successive function values within tolerance limits
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
plate:(intercept) -0.0969627 0.0764117 -1.2689 0.2044589
foodCirc 1.0374881 0.0339559 30.5540 < 2.2e-16 ***
plateCirc -0.0064866 0.0524547 -0.1237 0.9015835
plate:sexmale -0.0811157 0.0416113 -1.9494 0.0512512 .
plate:age16-34 0.1622542 0.0469167 3.4583 0.0005435 ***
plate:age35-54 0.0312484 0.0555634 0.5624 0.5738492
plate:age55-74 0.0556696 0.0836248 0.6657 0.5055987
plate:age75+ 0.1057646 0.2453797 0.4310 0.6664508
plate:handright -0.0177260 0.0539510 -0.3286 0.7424902
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -8284.6
McFadden R^2: 0.097398
Likelihood ratio test : chisq = 1787.9 (p.value = < 2.22e-16)
Versus the results of model 2. Note that the alternatives are correctly identified, but the names are not explicitly added to the model:
> summary(model2)
Call:
mlogit(formula = selected ~ food + plate | sex + age + hand,
data = TM, method = "nr", print.level = 0)
Frequencies of alternatives:
1 2
0.42847 0.57153
nr method
4 iterations, 0h:0m:0s
g'(-H)^-1g = 0.00423
successive function values within tolerance limits
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
2:(intercept) -0.0969627 0.0764117 -1.2689 0.2044589
foodCirc 1.0374881 0.0339559 30.5540 < 2.2e-16 ***
plateCirc -0.0064866 0.0524547 -0.1237 0.9015835
2:sexmale -0.0811157 0.0416113 -1.9494 0.0512512 .
2:age16-34 0.1622542 0.0469167 3.4583 0.0005435 ***
2:age35-54 0.0312484 0.0555634 0.5624 0.5738492
2:age55-74 0.0556696 0.0836248 0.6657 0.5055987
2:age75+ 0.1057646 0.2453797 0.4310 0.6664508
2:handright -0.0177260 0.0539510 -0.3286 0.7424902
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -8284.6
McFadden R^2: 0.097398
Likelihood ratio test : chisq = 1787.9 (p.value = < 2.22e-16)