How to summarize key statistics by two variables?

Question

Here is some sample code:

dat = data.frame(income = c(100,200,300,400,500,600), 
                 sex = c("M","M","M", "F","F","F"), 
                 num.kid = c(1,2,3,1,2,3))

I want to produce a 2-dimensional table that summarizes the key statistics (e.g. mean and var) of income distribution by sex and num.kid.

For example, table(dat$sex, dat$num.kid) would give me a 2x3 table with sex as rows and num.kid as columns, but the table would be filled with the count of those combinations. How can I bring a third variable (e.g. income) into the table? How can I fill the table with mean or var of income by sex and num.kid? This is almost like filling out an Excel pivot table using R code.

Hi Ruser, could you please include a `data.frame` of the expected results? — Felix T., May 10 '19 at 22:34
Sounds like you really need something like `dplyr::group_by`/`dplyr::summarize`, base R's `by` or `ave`, or something similar from `data.table`? — r2evans, May 10 '19 at 22:57

score 1 · Accepted Answer · answered May 10 '19 at 23:03

1

Here's a sample using your data:

library(dplyr)
dat %>% 
  group_by(sex) %>%  
  summarise(mean = mean(income), 
            var = var(income),
            sd = sd(income))

You can put multiple fields in the group_by statement.

answered May 10 '19 at 23:03

Ryan John

1,410
1
15
23

Ugh, I know you can use the dplyr package, but I'm was wondering if there is a more basic command to do this. – Ruser May 10 '19 at 23:16

How to summarize key statistics by two variables?

1 Answers1