0

I've got a dataset to which I want to apply the following operations:

  1. Nest data by a set of columns
  2. Modify the 'data' list column to a wide format
  3. Use both the 'data' list column and values in the grouping columns to create a new column in the 'data' list column, using the columns created under 2. as inputs as well.

1 & 2 work fine using purrr and dplyr, but in step 3 when I try to reference one of the columns created in step 2 it gives me an 'object not found' error. It works fine if I reference only columns in the 'data' list column that already existed before step 2. If I inspect the contents of the list column after step 2 everything referenced in step 3 is there so why is the newly created column not recognized?

Reprex

library(tidyverse)

mtcarsT <- mtcars %>% as_tibble() %>%
  group_by(cyl, gear, vs) %>%
  mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>%
# step 1, works fine.
  nest() %>%
  mutate(data = map(data, ~ .x %>%
                      group_by(mpg, disp, hp, drat, am, carb) %>%
# step 2, also works fine, generates new columns 'wt_C', 'wt_P', 'qsec_C', 'qsec_P' in 'data'
                      pivot_wider(names_from = cp_flag, values_from = wt:qsec) %>%
                      ungroup()))

# > mtcarsT[1,]
# A tibble: 1 x 4
# Groups:   cyl, vs, gear [1]
#    cyl    vs  gear data             
#  <dbl> <dbl> <dbl> <list>           
# 1     6     0     4 <tibble [1 x 10]>
#
# > mtcarsT$data[[1]]
# A tibble: 1 x 10
#    mpg  disp    hp  drat    am  carb  wt_C  wt_P qsec_C qsec_P
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#  1    21   160   110   3.9     1     4  2.62  2.88   16.5   17.0


# step 3A: this one works fine when only referencing columns in 'data' that already existed before step 2.
mtcarsT %>%
  mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                     .f = function(a, b, c) a %>% 
                       dplyr::mutate(vs_gear = carb - b + c)))  

# > .Last.value$data[[1]]
# A tibble: 1 x 11
#    mpg  disp    hp  drat    am  carb  wt_C  wt_P qsec_C qsec_P vs_gear
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>
# 1    21   160   110   3.9     1     4  2.62  2.88   16.5   17.0       8


# Step 3B: this is what I want to do, use the column 'wt_P' in the 'data' list column 
# that was created in step 2 along with other columns 'vs' and 'gear' in the nested tibble, 
# but it throws the error 'object wt_P not found'
mtcarsT %>%
  mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                     .f = function(a, b, c) a %>% 
                       dplyr::mutate(vs_gear = wt_P - b + c)))

# Error: object 'wt_P' not found
# Called from: mutate_impl(.data, dots, caller_env())


'''

I'm using R 3.6.2 x64 with tidyverse 1.3.0 inside RStudio 1.2.5033 on Windows 10.




Thijs
  • 179
  • 2
  • 9

1 Answers1

1

The example works as expected but the issue is present in step 1 itself.

To show that the example works you can change wt_P to wt_C and it works.

library(tidyverse)

mtcarsT %>%
   mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                      .f = function(a, b, c)
                             a %>% dplyr::mutate(vs_gear = wt_C - b + c)))

#     cyl    vs  gear data              
#   <dbl> <dbl> <dbl> <list>            
# 1     6     0     4 <tibble [1 × 11]> 
# 2     4     1     4 <tibble [8 × 11]> 
# 3     6     1     3 <tibble [2 × 11]> 
#....

The issue in step 1 is when you are doing

mtcars %>% as_tibble() %>%
    group_by(cyl, gear, vs) %>%
    mutate(cp_flag = rep(c('C', 'P'), length.out = n()))

there are certain groups with only 1 observation who do not get P value at all.

mtcars %>%  count(cyl, gear, vs)

# A tibble: 10 x 4
#     cyl  gear    vs     n
#   <dbl> <dbl> <dbl> <int>
# 1     4     3     1     1
# 2     4     4     1     8
# 3     4     5     0     1
# 4     4     5     1     1
# 5     6     3     1     2
# 6     6     4     0     2
# 7     6     4     1     2
# 8     6     5     0     1
# 9     8     3     0    12
#10     8     5     0     2 

Hence, wt_P is not calculated for them and it returns an error whereas wt_C does not. If you change order in rep from c('C', 'P') to c('P', 'C') then you will get error for wt_C and wt_P would work as expected.


To add the missing column, we can do :

mtcars %>%
   group_by(cyl, gear, vs) %>% 
   mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>% 
   nest() %>% 
   mutate(data = map(data, ~{ 
                       temp <- .x %>% 
                       group_by(mpg, disp, hp, drat, am, carb) %>% 
                       pivot_wider(names_from = cp_flag, values_from = wt:qsec, 
                       values_fill = list(wt = NA, qsec = NA)) %>% 
                       ungroup()
                       temp[setdiff(cols, names(temp))] <- NA;temp
         })) 


#     cyl    vs  gear data              
#   <dbl> <dbl> <dbl> <list>            
# 1     6     0     4 <tibble [1 × 10]> 
# 2     4     1     4 <tibble [8 × 10]> 
# 3     6     1     3 <tibble [2 × 10]> 
# 4     8     0     3 <tibble [12 × 10]>
# 5     6     1     4 <tibble [2 × 10]> 
# 6     4     1     3 <tibble [1 × 10]> 
# 7     4     0     5 <tibble [1 × 10]> 
# 8     4     1     5 <tibble [1 × 10]> 
# 9     8     0     5 <tibble [2 × 10]> 
#10     6     0     5 <tibble [1 × 10]> 

So all of them have same number of columns.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks, that makes total sense. I forgot to realise that each element of the list column is analysed in isolation, so it will only generate the columns for which there is data. As a follow-up I've been trying to add the missing columns using the `add_column` function from [add_column](https://stackoverflow.com/questions/45857787/adding-column-if-it-does-not-exist) inside the `map` call, calling `add_column(.x, !!!cols[!(names(cols) %in% names(.))])` at the end of the pipe with `cols <- c(wt_C=NA_real_, wt_P=NA_real_, qsec_C=NA_real_, qsec_P=NA_real_)`, but that throws a duplicate col error. – Thijs Mar 04 '20 at 04:32
  • Full snippet: `cols <- c(wt_C = NA_real_, wt_P = NA_real_, qsec_C = NA_real_, qsec_P = NA_real_) mtcarsT <- mtcars %>% as_tibble() %>% group_by(cyl, gear, vs) %>% mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>% nest() %>% mutate(data = map(data, ~ .x %>% group_by(mpg, disp, hp, drat, am, carb) %>% pivot_wider(names_from = cp_flag, values_from = wt:qsec, values_fill = list(wt = NA, qsec = NA)) %>% ungroup() %>% add_column(.x, !!!cols[!(names(cols) %in% names(.))])))` – Thijs Mar 04 '20 at 04:36