3

I tried to create a reproducible example but, frustratingly this actually works:

my_mtcars <- mtcars %>% 
  rownames_to_column('car') %>% 
  group_by(vs) %>% 
  nest

my_mtcars <- my_mtcars %>% 
  mutate(lhs = map(.x = data, ~ .x %>% select(car:drat))) %>% 
  mutate(rhs = map(.x = data, ~ .x %>% select(car, wt:carb) %>% rename(model = car))) %>% 
  mutate(together_again = map2(.x = lhs, .y = rhs, ~ inner_join(.x, .y, by = c('car' = 'model'))))

The above works but shows in a nutshell what I'm trying to do with my real data. My actual data frame which includes list columns fails to mutate with an inner join and I'm hoping that by describing and showing some anonymised data here someone may be able to advise.

My data frame pdata:

data
# A tibble: 104 x 7
   MONETIZATION_WEEK_COHORT data                   cut_off clv_obj          model            prediction       training_period_metrics
   <date>                   <list>                   <int> <list>           <list>           <list>           <list>                 
 1 2020-03-30               <tibble [214,509 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [7,328 × 3]>   
 2 2020-03-30               <tibble [214,509 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [7,328 × 3]>   
 3 2020-04-06               <tibble [496,626 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [20,060 × 3]>  
 4 2020-04-06               <tibble [496,626 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [20,060 × 3]>  
 5 2020-04-13               <tibble [595,775 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [25,816 × 3]>  
 6 2020-04-13               <tibble [595,775 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [25,816 × 3]>  
 7 2020-04-20               <tibble [548,436 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [22,161 × 3]>  
 8 2020-04-20               <tibble [548,436 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [22,161 × 3]>  
 9 2020-04-27               <tibble [529,507 × 9]>       7 <named list [2]> <named list [2]> <named list [2]> <tibble [21,113 × 3]>  
10 2020-04-27               <tibble [529,507 × 9]>       8 <named list [2]> <named list [2]> <named list [2]> <tibble [21,113 × 3]>  

I'm trying to join prediction with training period metrics for each row. Here's what a sample of those two fields look like, they are both data frames:

The .y field in map2 below:

 pdata$prediction[[1]]$result %>% head(2) %>% glimpse
Rows: 2
Columns: 11
$ Id                      <chr> "123abc", "def456"
$ period.first            <date> 2020-05-21, 2020-05-21
$ period.last             <date> 2020-08-26, 2020-08-26
$ period.length           <int> 14, 14
$ actual.x                <int> 0, 0
$ actual.total.spending   <dbl> 0, 0
$ PAlive                  <dbl> 0.72933712, 0.05683547
$ CET                     <dbl> 19.2692978, 0.1285307
$ DERT                    <dbl> 13.37550762, 0.08921192
$ predicted.mean.spending <dbl> 839.648, 1017.683
$ predicted.CLV           <dbl> 11230.71800, 90.78944

The .x field in map2 below:

pdata$training_period_metrics[[1]] %>% head(2) %>% glimpse
Rows: 2
Columns: 3
$ S              <chr> "abc123", "def456"
$ Transactions   <int> 40, 3
$ Total_Spending <dbl> 14660, 1797

I'm trying to join these into a data frame as a new column:

pdata %>% mutate(combined_data = map2(.x = training_period_metrics, .y = prediction, ~ inner_join(.x, .y$result, by = c('S' = 'Id'))))
Error: Problem with `mutate()` input `combined_data`.
x `x` and `y` must share the same src, set `copy` = TRUE (may be slow).
ℹ Input `combined_data` is `map2(...)`.

How can I join prediction$result with training_period_metrics within my purrr loop?

user14328853
  • 414
  • 2
  • 10
  • Please check if all the elements in the rhs or lhs have data i.e. if i do `my_mtcars$rhs[[2]] <- NULL; my_mtcars %>% mutate(together_again = map2(.x = lhs, .y = rhs, ~ inner_join(.x, .y, by = c('car' = 'model'))))# Error: Problem with `mutate()` input `together_again`. ✖ `x` and `y` must share the same src, set `copy` = TRUE (may be slow).` – akrun Mar 24 '21 at 19:18
  • Aha! Yes, some of them are NULL – user14328853 Mar 24 '21 at 19:36
  • If you correct for those elements by skipping them, it would be work. It is not clear what kind of conditions you want for those cases – akrun Mar 24 '21 at 19:37
  • In the case of a NULL, I'd like to make the new df NULL or NA (I don't understand which is best here) else I would like to do the join – user14328853 Mar 24 '21 at 19:40
  • You can try the solution posted below. – akrun Mar 24 '21 at 19:45

1 Answers1

2

We can use a condition to do the join only if both .x and .y are not NULL or else return NULL

my_mtcars %>%
    mutate(together_again = map2(.x = lhs, .y = rhs,
  ~ if(is.null(unlist(.x))|is.null(unlist(.y))) list(NULL) else
        inner_join(.x, .y, by = c('car' = 'model'))))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I should be able to do this on my own but I'm trying and failing to modify your solution to ifelse() syntax. Is that doable here? – user14328853 Mar 24 '21 at 19:54
  • 1
    @user14328853 `ifelse` is not appropriate for this case. It is vectorized to do for all elements, but in our case, the elements i.e. data.frames are inside a list, so `if/else` would be better. – akrun Mar 24 '21 at 19:57
  • Or another option is to wrap with `tryCatch` and return a value when there is an error – akrun Mar 24 '21 at 19:58
  • I took a look at the docs for tryCatch in r. I find it tricky to follow. My actual code `%>% mutate(combined_data = map2(.x = training_period_metrics, .y = prediction, ~ tryCatch(inner_join(.x, .y$result, by = c('S' = 'Id')))))` gives the same error. Should I ask a fresh question? – user14328853 Mar 24 '21 at 20:49
  • 1
    @user14328853 This [link](https://stackoverflow.com/questions/12193779/how-to-write-trycatch-in-r) may help you in writing `tryCatch`. In your code you have to specify the `error = ` – akrun Mar 24 '21 at 20:50