The lm()
function is designed for linear regression, which generally assumes a continuous response.
From the lm()
details page:
A typical model has the form response ~ terms
where response is the (numeric) response vector and terms is a series of terms
which specifies a linear predictor for response
.
Your gender
variable is a factor (not continuous; more information about data types here). If you really wanted to predict gender (a factor), you would need to use glm()
for logistic regression.
Yes, you can use summary()
on lm
objects, but whether linear (or logistic) regression is best for your specific research question is a different question.
library(tidyverse)
set.seed(123)
gender <- sample(1:2, 10, replace = TRUE) %>% factor()
x1 <- sample(1:12, 10, replace = TRUE) %>% as.numeric()
x2 <- sample(1:100, 10, replace = TRUE) %>% as.numeric()
x3 <- sample(50:75, 10, replace = TRUE) %>% as.numeric()
my_data_set <- data.frame(gender, x1, x2, x3)
sapply(my_data_set, class)
#> gender x1 x2 x3
#> "factor" "numeric" "numeric" "numeric"
# error
# gender_disgust_set <- lm(gender ~ x1, data = my_data_set)
# summary(gender_disgust_set)
# logistic regression
gender_disgust_set1 <- glm(gender ~ x1, data = my_data_set, family = "binomial")
summary(gender_disgust_set1)
#>
#> Call:
#> glm(formula = gender ~ x1, family = "binomial", data = my_data_set)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.2271 -0.9526 -0.8296 1.1571 1.5409
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.6530 1.7983 0.363 0.717
#> x1 -0.1342 0.2149 -0.625 0.532
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 13.46 on 9 degrees of freedom
#> Residual deviance: 13.06 on 8 degrees of freedom
#> AIC: 17.06
#>
#> Number of Fisher Scoring iterations: 4
# or flip it around
# while this model works, please look into dummy-coding before using
# factors to predict continuous responses
gender_disgust_set2 <- lm(x1 ~ gender, data = my_data_set)
summary(gender_disgust_set2)
#>
#> Call:
#> lm(formula = x1 ~ gender, data = my_data_set)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.500 -2.438 0.500 2.688 3.750
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 8.500 1.371 6.199 0.00026 ***
#> gender2 -1.250 2.168 -0.577 0.58010
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.359 on 8 degrees of freedom
#> Multiple R-squared: 0.03989, Adjusted R-squared: -0.08012
#> F-statistic: 0.3324 on 1 and 8 DF, p-value: 0.5801