How to summarize a variable based on another column in R?

Question

I have a dataset that looks like this:

  study_id weight gender
1      100     55   Male
2      200     65 Female
3      300     84 Female
4      400     59   Male
5      500     62 Female
6      600     75   Male
7      700     70   Male

I would like to find the mean, median, etc. (everything that the summary() function gives) for the weight variable, but separately for both men and women.

In other words, I would like to find the summary statistics of the weight variable for males and females separately.

How can I go about doing this?

Reproducible Data:

data<-data.frame(study_id=c("100","200","300","400","500","600","700"),weight=c("55","65","84","59","62","75","70"),gender=c("Male","Female","Female","Male","Female","Male","Male"))

score 2 · Accepted Answer · answered Jul 12 '22 at 13:47

Although there are reasonable suggestions by harre, I prefer to do it this way:

library(dplyr)

data  |>
    group_by(gender)  |>
    mutate(weight = as.numeric(weight))  |>
    summarise(
        across(weight, list(mean = mean, median = median))
    )
# # A tibble: 2 x 3
#   gender weight_mean weight_median
#   <chr>        <dbl>         <dbl>
# 1 Female        70.3          65
# 2 Male          64.8          64.5

The advantages of mutate(across()) are that if you had 2 columns, or 5, you could easily extend it e.g. mutate(across(weight:height)). There are more examples of this in the docs.

score 1 · Answer 2 · answered Jul 12 '22 at 14:23

For a base R solution (literally replying to "everything that the summary() function gives"):

tapply(as.numeric(data$weight), INDEX = data$gender, FUN = summary)

$Female
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  62.00   63.50   65.00   70.33   74.50   84.00 

$Male
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  55.00   58.00   64.50   64.75   71.25   75.00

How to summarize a variable based on another column in R?

2 Answers2