1

This is a small data set from a fictional experiment (N=40). The experiment has 2 conditions, each with a variable number of subjects (8 total). Each subject is observed multiple times, and is thus associated with a variable number of rows.

dat <- structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L), 
    condition = c("a", "a", "a", "a", "a", "a", "a", "a", "a", 
    "a", "b", "b", "a", "a", "a", "a", "a", "a", "a", "b", "b", 
    "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "a", 
    "a", "a", "a", "a", "a", "a", "a"), DV = c(2.81157687969627, 
    -0.842120813381446, 0.581945736602951, 0.338837761518314, 
    1.89265238800308, 1.61748828762215, 1.50241281473164, -0.371722939264336, 
    2.34943581573083, 1.9748530958824, 0.362129637270942, 1.8277964140968, 
    1.70637518431997, 1.12865681599091, 3.05142782728916, 0.622010892882544, 
    2.00560122425538, -0.447121746565671, 1.15358864340752, 2.12585003262731, 
    1.52184076917827, -0.50606450477134, 0.345547956000384, 1.04829010205181, 
    3.0328567780456, 0.443519707656065, -0.57901488419535, 1.26806312350003, 
    2.47565945691539, 1.27802539397507, 1.47560146605553, -0.563842875341247, 
    -1.61470314081307, 0.293947258804903, 2.39827092020247, 2.05934478059775, 
    0.171958205176952, 1.62183818483135, 1.03045239398212, -0.0228550910766967
    )), row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"
))

I would like to efficiently and tidily add a column that counts the number of subjects in each condition. Currently, I am using this convoluted piece of code:

dat %>%
  group_by(subject, condition) %>%
  nest() %>%
  group_by(condition) %>%
  nest() %>%
  mutate(n = map_dbl(data, nrow)) %>%
  unnest(data) %>% 
  unnest(data)

Is there a better way to do this?

Nathan
  • 340
  • 2
  • 11

1 Answers1

2

Perhaps, n_distinct would help

library(dplyr)
dat %>% 
    group_by(condition) %>% 
    mutate(n = n_distinct(subject))

NOTE: By doing the multiple group_by, the column order would be different. If we make the column order same, and arrange(subject, condition), all.equal will return TRUE for both

akrun
  • 874,273
  • 37
  • 540
  • 662