Summarize one column, grouped by another in R

Question

I believe this is a simple question, I think I just don't have the ability to logically think it out to be able to search it.

I have a table of data:

Column 1: Sex (M/F)
Column 2 Plays Sport (Y/N)

I need a summary table which shows:

Sex | Plays Sport Yes | Plays Sport No

I can't for the life of me figure out how to do it with dplyr.

Solution in base r would be preferred if not too complicated.

From your description it's not entirely clear what's in the data, e.g. what the column names are. Could you copy the first few rows and edit them into your question? See also: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — David Robinson, Nov 26 '18 at 20:43
Couple comments to improve your question: 1) Please provide the actual data in a reproducible format (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), 2) `dplyr` is an R package which means it's not base R. Are you trying to find an answer using base R or are you trying to use dplyr? — Ben G, Nov 26 '18 at 20:44

score 2 · Answer 1 · answered Nov 26 '18 at 20:49

Using dplyr and making some assumptions about exactly what you're looking for:

library(tidyverse)

data <- data.frame(Sex = c("M", "F")[rbinom(10, 1, 0.5) + 1],
                   PlaysSport = c(TRUE, FALSE)[rbinom(10, 1, 0.5) + 1])

data %>% 
  group_by(Sex, PlaysSport) %>% 
  summarise(count = n())

# A tibble: 4 x 3
# Groups:   Sex [?]
      Sex   PlaysSport count
    <fctr>    <lgl>    <int>
1      F      FALSE     1
2      F       TRUE     3
3      M      FALSE     4
4      M       TRUE     2

score 2 · Answer 2 · answered Nov 26 '18 at 20:57

We can use count with spread

library(tidyverse)
df1 %>%
   count(Sex, Sport) %>% 
   spread(Sport, n, fill = 0)
# A tibble: 2 x 3
#  Sex       N     Y
#   <chr> <dbl> <dbl>
#1 F         2     0
#2 M         3     1

data

df1 <- data.frame(Sex = c("M", "M", "F", "M", "M", "F"),
            Sport = c("N", "Y", "N", "N", "N", "N"), stringsAsFactors = FALSE)

score 0 · Answer 3 · answered Nov 26 '18 at 20:52

You could use table

# dummy data
df1 <- data.frame(Sex = c("M", "M", "F", "M", "M", "F"),
                Sport = c("N", "Y", "N", "N", "N", "N"), stringsAsFactors = FALSE)
df1
#  Sex Sport
#1   M     N
#2   M     Y
#3   F     N
#4   M     N
#5   M     N
#6   F     N

Result

table(df1)
#   Sport
#Sex N Y
#  F 2 0
#  M 3 1

Here is another option with reshape2::dcast

reshape2::dcast(df1, Sex ~ paste0("Sport_", Sport), 
                fun.aggregate = length # default 
                )

Result

#  Sex Sport_N Sport_Y
#1   F       2       0
#2   M       3       1

Summarize one column, grouped by another in R

3 Answers3

data