How to Test the significance of difference in mean estimates in R?

Question

Solved: I'm working on market research and want to conduct a Multinomial Logit Model by R based on the estimation below. It is designed to test whether consumers' exposure to specific advertisements could influence their preferences for the attribute, i.e., equipped with Bluetooth. But as the model below contains a dummy regarding whether the product has Bluetooth or not, I'm not sure how it should be transferred into R code. enter image description here

Update: Now I have built the correct mixed MNL model and want to test the significance of the difference in mean estimates. For example, to test whether the difference between “Group5:NoBluetooth” and “Group6:NoBluetooth” is significant or not. In this case, which test should be computed here and how should I run the code? The R results are shown in the following image.

score 0 · Answer 1 · answered Aug 25 '22 at 07:00

You can implement dummy variables the same way as in a normal lm() model. Here is a reproducible example. You should be able to just copy and paste the code below and run it. For your data sample, just make sure that your dummy variable bluethooth is coded as a factor i.e., the wifi variable in my example.

library(tidyverse)
library(gmnl)
library(mlogit)

# importing sample data
data("TravelMode", package = "AER")

# adding an additional column for wifi
TravelMode <- TravelMode %>%
  mutate(wifi = as.factor(case_when(mode == "air" ~ "Yes",
                         mode == "train" ~ "Yes",
                         mode == "bus" ~ "Yes",
                         mode == "car" ~ "No")))

# transforming datafram into mlogit.data (important for gmnl)
TM1 <- mlogit.data(TravelMode, choice = "choice", shape = "long",
                  alt.levels = c("air", "train", "bus", "car"))

# fitting model with dummy variable "wifi" and a scaled version of the variable "travel"-time.
mixl1 <- gmnl(choice ~ wifi + scale(travel), data = TM1)

# summary output
summary(mixl1)
#> 
#> Model estimated on: Do Aug 25 08:53:55 2022 
#> 
#> Call:
#> gmnl(formula = choice ~ wifi + scale(travel), data = TM1, method = "nr")
#> 
#> Frequencies of categories:
#> 
#>     air   train     bus     car 
#> 0.27619 0.30000 0.14286 0.28095 
#> 
#> The estimation took: 0h:0m:0s 
#> 
#> Coefficients:
#>                      Estimate  Std. Error z-value  Pr(>|z|)    
#> train:(intercept)     1.89032     0.42585  4.4389 9.042e-06 ***
#> bus:(intercept)       1.22794     0.45919  2.6741  0.007492 ** 
#> car:(intercept)       0.82882 21355.60419  0.0000  0.999969    
#> wifiYes              -0.84708 21355.60419  0.0000  0.999968    
#> scale(travel)        -1.07513     0.21390 -5.0265 4.996e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Optimization of log-likelihood by Newton-Raphson maximisation
#> Log Likelihood: -269.57
#> Number of observations: 210
#> Number of iterations: 4
#> Exit of MLE: gradient close to zero (gradtol)

^{Created on 2022-08-25 with reprex v2.0.2}

Hi Noah, thank you so much for the reply! It is helpful. But what about combining two dummies into one variable of the model, i.e., advertising and Bluetooth, like what I posted in the image in the question. I want to implement the model formulated as follows: u_ij=∑β^Ad*x_j*I_i^Ad+αPrice_j+γZ_j+ϵ_ij, where I_i^Ad indicates the dummy of exposure to advertisement. β^Ad indicates the average preference coefficient for Bluetooth. x_j captures whether Bluetooth is equipped or not. Z_j is set as a vector for color and screen size. — Sissie, Aug 25 '22 at 11:28
Happy I could help. You should be able to just add these variables to your formula as you would to for any other model: `gmnl(choice ~ bluethooth + adv_exposure + color + screensize, data = your_data)`. — Noah, Aug 25 '22 at 11:40
I tried to add the variable into the formula in R and an error turned out as "Error in solve.default(H, g[!fixed]) :" — Sissie, Aug 25 '22 at 12:49

score 0 · Answer 2 · answered Aug 25 '22 at 13:02

I want to expand on the previous answer, because I feel there's a problem with your model. Noah suggested this code

gmnl(choice ~ bluethooth + adv_exposure + color + screensize, data = your_data)

Which will give you the coefficient of utility of the 'bluetooth' option and the coefficient of utility of the ad. It's not exactly what your model describes as you aren't interested in the utility of the ad itself. What you seem to be interested in is the interaction between one coefficient (bluetooth) and a covariable (ad exposure). The model you posted would be coded like this :

gmnl(choice ~ bluethooth*adv_exposure + price + color + screensize, data = your_data)

However I feel there's a problem with that utility model. There's no distinction between the utility of the option itself and the extra utility brought by the exposure to the ad. To speak plainly, according to it, the utility for having the 'bluetooth' option is null if you haven't seen the ad. I doubt that to be true and I doubt it is what you intend. If you gave me or whoever else here who hasn't seen the ad the option between 2 identical items at identical prices, one having the bluetooth option, the other not having it, we would probably all choose the one with the extra option. Meaning that the option itself has a utility of its own, with or without the ad.

I would suggest including the option bluetooth as a predictor and also add the interaction between the two terms as a predictor, like this :

gmnl(choice ~ bluethooth + bluetooth*adv_exposure + price + color + screensize, data = your_data)

Please note that this model makes the assumption that the ad impacts only one of the parameters. Depending on the ad, it may be a strong assumption to make.

Hi, thanks for your reply! As I'm using the package "mlogit", my code refers to as following: mnl_dummy <- mlogit(Choice ~ None + Price + Bluetooth_NO + color_red+color_black + screensize_small+screensize_medium+ TreatmentGroup:Bluetooth_NO| 0, data = myData_ml). It turns out no errors so I think it works now! Thanks for the help! — Sissie, Aug 25 '22 at 14:31

How to Test the significance of difference in mean estimates in R?

2 Answers2