3

Recently, I had to work with R and SPSS to analyze a dataset with Multinomial Regression framework. We surveyed some participants (10-12 years old) and we asked which "professional field" they like the most, then we asked how often they accessed the internet. So, the outcome is a Categorical variable": professional field -- "military", "I don't know", and "Other profession"; and the independent variable is also a categorical variable (how often do you access the internet ("I don't have access", "1-3 hours/day", "3-5 hours/day").

I ran a model using R (with nnet package, via multinom function), and other statistician ran using SPSS. All reference category was defined correctly.

Now, when we compare the results, they don't agree for the second category of my independent variable. The first one is ok.

Please have a look at the entire code:

library(tidyverse)
library(stargazer)
library(nnet)

ds <- ds %>% mutate(internet = factor(internet))
ds <- ds %>% mutate(internet = relevel(internet, ref = "I dont have internet access"))

ds <- ds %>% mutate(field = factor(field))
ds <- ds %>% mutate(fielf = relevel(field, ref = "I dont know"))

mod <- multinom(field ~ internet, data = ds, maxit=1000, reltol = 1.0e-9)
stargazer(mod, type = 'text')

and the SPSS result SPSS results

For the sake of clarity, when the independent variable has only two categories (such as sex, male and female), both R and SPSS agree with its results

SPSS results 2

After a huge effort trying to understand the discrepancy between both results, I read that nnet estimation could have some problems (optimization problem ?), and that the discrepancy of results is not so strange as I was thinking at the beginning..

Can someone explain to me what's going on here? What am I missing?! I assume SPSS and R must have the same results if we are running the same model.

Thank you

That's the ds I'm using in this example:

ds <- structure(list(sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                             2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
                                             2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                             2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
                                             1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
                                             2L, 1L), .Label = c("male", "female"), class = "factor"), internet = structure(c(3L, 
                                                                                                                              3L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L, 
                                                                                                                              2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 
                                                                                                                              2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 
                                                                                                                              2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 2L, 
                                                                                                                              2L, 1L, 3L, 2L, 2L, 3L, 2L, 2L), .Label = c("I dont have internet access", 
                                                                                                                                                                          "1-3 hours/day", "3-5 hours/day"), class = "factor"), field = structure(c(1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("I dont know", "Military", 
                                                                                                                                                                                                                                                                                                "Other profession"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                             -73L))
Luis
  • 1,388
  • 10
  • 30

1 Answers1

0

You could use mlogit alternatively which resembles the SPSS results more closely. The SPSS values should be valid, since Stata yields similar results (-14.88 (982.95), 11.58 (982.95), 11.44 (982.95)). The remaining deviations might stem from the ridiculous significance of "other profession".

library(mlogit)
ml.dat <- mlogit.data(ds, choice="field", shape="wide")
ml <- mlogit(field ~ 1 | internet, data=ml.dat)

Yielding

texreg::screenreg(ml)
=========================================================
                                                Model 1  
---------------------------------------------------------
Military:(intercept)                               -0.41 
                                                   (0.91)
Other profession:(intercept)                      -16.89 
                                                (2690.89)
Military:factor(internet)1-3 hours/day             -1.50 
                                                   (1.06)
Other profession:factor(internet)1-3 hours/day     13.60 
                                                (2690.89)
Military:factor(internet)3-5 hours/day             -1.64 
                                                   (1.06)
Other profession:factor(internet)3-5 hours/day     13.46 
                                                (2690.89)
---------------------------------------------------------
AIC                                                85.49 
Log Likelihood                                    -36.74 
Num. obs.                                          73    
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks! Now the results are close, but I'm still trying to discover what's going on here. I see R and SPSS Information Criteria are very very different (R AIC = 85.49 vs SPSS AIC = 24). By the way, the texreg package produces a nice table, pretty similar than stargazer! I didn't know this package, thank you! – Luis Apr 24 '19 at 15:35