1

I would like to apply a function to each group of a nested/grouped dataset using mutate. This example should help explain the goal. Need advise on how to code this correctly.

Make a fake dataset that is grouped/nested.

data(kidney)
grp_kidney <- kidney %>% group_by(sex) %>% nest()

Which has the following structure:

> grp_kidney
# A tibble: 2 x 2
# Groups:   sex [2]
    sex data             
  <dbl> <list>           
1     1 <tibble [20 × 6]>
2     2 <tibble [56 × 6]>

Next we make a function to fit survival curves to this data.

sFit <- function(df, ci = 'none'){
  survfit(Surv(time, status) ~ age, data = df, conf.type = ci)
}

Lastly, we apply this function to each row of the grouped data and save the result as a new column in the grouped tibble using purrr::map and dplyr::mutate.

grp_kidney <- grp_kidney %>%
  mutate(plain = map(grp_kidney$data, sFit, ci = 'plain')) %>%
  mutate(loglog = map(grp_kidney$data, sFit, ci = 'log-log'))

Error: Problem with `mutate()` input `plain`.
x Input `plain` can't be recycled to size 1.
ℹ Input `plain` is `map(grp_kidney$data, sFit, ci = "plain")`.
ℹ Input `plain` must be size 1, not 2.
ℹ The error occurred in group 1: sex = 1.
Run `rlang::last_error()` to see where the error occurred.

What I was hoping to have result in this example is a nested data.frame with the following characteristics:

  1. One row for each grouping variable element (2 rows in this example)
  2. Col 1: sex - 1 or 2 in this example
  3. Col 2: data - the data.frame for each group
  4. Col 3: plain - output of survfit model with plain CIs
  5. Col 4: loglog - output of survfit model with log-log CIs

I can make this work if I make two functions, one for 'plain' and one for 'log-log'. That seems like a waste and would prefer to pass arguments to a more broad function instead. Appreciate help from any coding experts.

Brant

1 Answers1

2

I think I have discovered my error and demonstrate it below:

grp_kidney <- grp_kidney %>%
  mutate(plain = map(data, sFit, ci = 'plain')) %>%
  mutate(loglog = map(data, sFit, ci = 'log-log'))

The difference between this and my original code is the way I reference the data using data instead of grp_kidney$data.

Dharman
  • 30,962
  • 25
  • 85
  • 135