-1

Let's say I have the following data frame

weight <- c(100, 137, 158, 225, 149)
age <- c(15, 18, 21, 31, 65)
gender <- c("Female, "Male, "Male", "Male", "Female")
table <- data.frame(weight, age, gender)

If I wanted to do a linear regression to see how weight predicts age, as well as examine it, I'd do:

allData <- lm(age ~ weight, data = table)
summary(allData)

What do I do if I wanted to examine how weight predicts age for females only? As in, use only the female data population to see how weight predicts age? I'm thinking something like:

FemaleData <- lm(age ~ weight, data=table (gender="Female"))
Kyle L
  • 19
  • 5
  • 4
    Try `FemaleData <- lm(age ~ weight, data=table[table$gender == "Female",])` – Glaud Nov 04 '17 at 16:38
  • Perfect, Thanks!!!!!!!!! – Kyle L Nov 04 '17 at 16:46
  • Another way is using `dplyr` package. It's easier to generalise as it will create a linear regression model for each value of the variable you want to split your dataset. Check here: https://stackoverflow.com/questions/22713325/fitting-several-regression-models-with-dplyr – AntoniosK Nov 04 '17 at 17:20
  • @AntoniosK do you mind expanding? how would i write the code using dplyr in my situation? thanks! – Kyle L Nov 04 '17 at 19:19
  • This is nonsense, How on earth should possibly weight predict age? – jay.sf Nov 05 '17 at 09:05

1 Answers1

0
library(dplyr)
library(broom)

# example dataset
weight <- c(100, 137, 158, 225, 149, 148)
age <- c(15, 18, 21, 31, 65, 64)
gender <- c("Female", "Male", "Male", "Male", "Female", "Female")
table <- data.frame(weight, age, gender)

# build model for each gender value and store it in a column
table %>%
  group_by(gender) %>%                                  # for each gender value
  do(model = summary(lm(age ~ weight, data = .))) %>%   # build a model
  ungroup() -> tbl_models

# check how your new dataset looks like
tbl_models

# # A tibble: 2 x 2
#     gender            model
#   * <fctr>           <list>
#   1 Female <S3: summary.lm>
#   2   Male <S3: summary.lm>

# access / view model for Females
tbl_models %>% filter(gender == "Female") %>% pull(model)

# [[1]]
# 
# Call:
#   lm(formula = age ~ weight, data = .)
# 
# Residuals:
#   1          2          3 
# -0.0002125 -0.0101997  0.0104122 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
#   (Intercept) -8.706e+01  4.943e-02   -1761 0.000361 ***
#   weight       1.021e+00  3.681e-04    2773 0.000230 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.01458 on 1 degrees of freedom
# Multiple R-squared:      1,   Adjusted R-squared:      1 
# F-statistic: 7.69e+06 on 1 and 1 DF,  p-value: 0.0002296

# build model for each gender value and store it as a tidy dataset
table %>%
  group_by(gender) %>%
  do(tidy(lm(age ~ weight, data = .))) %>%
  ungroup()

# # A tibble: 4 x 6
#   gender        term    estimate    std.error   statistic      p.value
#   <fctr>       <chr>       <dbl>        <dbl>       <dbl>        <dbl>
# 1 Female (Intercept) -87.0609860 0.0494272875 -1761.39518 0.0003614292
# 2 Female      weight   1.0206120 0.0003680516  2773.01334 0.0002295769
# 3   Male (Intercept)  -2.3370680 0.2181313917   -10.71404 0.0592475719
# 4   Male      weight   0.1480985 0.0012299556   120.40961 0.0052869963
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • Despite working code, this is very unusual. The normal way is just adding gender into your regression `lm(age ~ weight + gender, data = table)` and you can directly read out the effect for women (though it doesn't make sense that weight predicts age in any way). – jay.sf Nov 05 '17 at 09:12
  • Yes, that's the way to do it if you want one model and gender as a variable. It's not unusual to want to have one model for each gender. Or a model for each year, or month, in other applications. Or generally build a model for subsets of your dataset given a specific column. That's what the questioner wanted. – AntoniosK Nov 05 '17 at 11:01
  • Thank you both. I indeed wanted just one model (age is predicted by weight) and to examine that relationship amongst subgroups. Slight difference between that and testing to see how age is predicted by both weight and gender. In other words, as @AntoniosK said: building a model for subsetse of my dataset. – Kyle L Nov 05 '17 at 15:58