1

I am working with election survey data and have a dataset loaded into R and I have objects created. Right now I am working in tidyverse. I am trying to run a regression with male and another variable. However, male is under gender and I am trying to isolate just male from the gender overall. In the data male comes up as 1 and female is 2.

Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors

I do get coefficients, but then I try to get summary:

gender_disgust_set<-lm(gender~dem_disgusted, data=my_data_set)
summary(gender_disgust_set)

then I get this warning message:

Error in quantile.default(resid) : (unordered) factors are not allowed
In addition: Warning message:
In Ops.factor(r, 2) : ‘^’ not meaningful for factors

lm(gender~dem_disgusted, data=my_data_set)
subset(my_data_set, gender = male)
    
male_total<-subset(my_data_set, gender = male)
summary(male_total)
lm(gender~dem_disgusted, data=my_data_set)
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
Irene
  • 11
  • 2
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What types of values are in the "gender" column? Are they numeric? If you are trying to predict one of two possible outcomes, logistic regression might be more appropriate than linear regression. – MrFlick Nov 29 '22 at 15:23
  • The values in the "gender" column are 1.Male and 2.Female. I am trying to compare them both the dem_disgusted (it's the generic question they ask people on the survey, how disgusted,hopeful, etc. a person is with the candidate). I am trying to show the relationship between the gender (so male and female) and then their level of disgust with the candidate. I also want to find the feeling thermometer and the strength a survey respondent has for their candidate choice, later on. – Irene Nov 29 '22 at 16:08
  • 2
    OK. it seems like you may have your response and covariates swapped in your model then. Right now you are trying to predict gender based on a level of disgust which seems flipped. Perhaps you need some statistical modeling advice from [stats.se] first. – MrFlick Nov 29 '22 at 16:21
  • I'm not entirely sure what your model is trying to do? As it stands your model predicts someone's gender based on their values for the variable "dem_disgusted". Is this the intended behavior? – David Nov 29 '22 at 17:02
  • It's supposed to go the other way around. So thanks for pointing this out. It's supposed to do voting behavior based on gender. And then I was going to do a few other models. – Irene Dec 02 '22 at 14:24

1 Answers1

2

The lm() function is designed for linear regression, which generally assumes a continuous response.

From the lm() details page:

A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response.

Your gender variable is a factor (not continuous; more information about data types here). If you really wanted to predict gender (a factor), you would need to use glm() for logistic regression.

Yes, you can use summary() on lm objects, but whether linear (or logistic) regression is best for your specific research question is a different question.

library(tidyverse)
set.seed(123)

gender <- sample(1:2, 10, replace = TRUE) %>% factor()
x1 <- sample(1:12, 10, replace = TRUE) %>% as.numeric()
x2 <- sample(1:100, 10, replace = TRUE) %>% as.numeric()
x3 <- sample(50:75, 10, replace = TRUE) %>% as.numeric()
my_data_set <- data.frame(gender, x1, x2, x3)
sapply(my_data_set, class)
#>    gender        x1        x2        x3 
#>  "factor" "numeric" "numeric" "numeric"

# error
# gender_disgust_set <- lm(gender ~ x1, data = my_data_set)
# summary(gender_disgust_set)

# logistic regression
gender_disgust_set1 <- glm(gender ~ x1, data = my_data_set, family = "binomial")
summary(gender_disgust_set1)
#> 
#> Call:
#> glm(formula = gender ~ x1, family = "binomial", data = my_data_set)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.2271  -0.9526  -0.8296   1.1571   1.5409  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)
#> (Intercept)   0.6530     1.7983   0.363    0.717
#> x1           -0.1342     0.2149  -0.625    0.532
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 13.46  on 9  degrees of freedom
#> Residual deviance: 13.06  on 8  degrees of freedom
#> AIC: 17.06
#> 
#> Number of Fisher Scoring iterations: 4

# or flip it around
# while this model works, please look into dummy-coding before using
# factors to predict continuous responses
gender_disgust_set2 <- lm(x1 ~ gender, data = my_data_set)
summary(gender_disgust_set2)
#> 
#> Call:
#> lm(formula = x1 ~ gender, data = my_data_set)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -4.500 -2.438  0.500  2.688  3.750 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    8.500      1.371   6.199  0.00026 ***
#> gender2       -1.250      2.168  -0.577  0.58010    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.359 on 8 degrees of freedom
#> Multiple R-squared:  0.03989,    Adjusted R-squared:  -0.08012 
#> F-statistic: 0.3324 on 1 and 8 DF,  p-value: 0.5801
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • This is helpful! Thank you so much. What is the set.seed(123 )mean? I am familair with glm and tidyverse and everything, but I've not seen set.seed(123). – Irene Dec 02 '22 at 14:19
  • Hi Irene, I used the [`set.seed()` function](https://stackoverflow.com/questions/13605271/reasons-for-using-the-set-seed-function) just so my code can be exactly reproduced by others. I randomly generated data (`gender`, `x1`) and `set.seed()` makes it so others can get the exact same "random" data. – jrcalabrese Dec 02 '22 at 14:23
  • So I understand what gender is in the data set as a factor. But I need to get it to be the number or proportion of the sample. For example, I need male and female but as their total numbers. Not as individual variables. I need them to be vectors and not lists. – Irene Dec 02 '22 at 15:50
  • I recommend that you post a new question on [CrossValidated](https://stats.stackexchange.com/) to get help with what kind of analysis to conduct for your specific research question. Please note [lists are technically vectors](https://stackoverflow.com/questions/8594814/what-are-the-differences-between-vector-and-list-data-types-in-r) and as far as I can tell, your dataset contains no lists. Whether you should keep `gender` as a factor or [transform it into a proportion](https://stackoverflow.com/questions/24576515/relative-frequencies-proportions-with-dplyr) depends on your question. – jrcalabrese Dec 02 '22 at 17:04