1

This is a very simplified version of my actual problem.

My real df has many columns and I need to perform this action using a select from a character vector of column names.

library(tidyverse)


df <- data.frame(a1 = c(1:5), 
             b1 = c(3,1,3,4,6), 
             c1 = c(10:14), 
             a2 = c(9:13), 
             b2 = c(3:7), 
             c2 = c(15:19))
df
  a1 b1 c1 a2 b2 c2
1  1  3 10  9  3 15
2  2  1 11 10  4 16
3  3  3 12 11  5 17
4  4  4 13 12  6 18
5  5  6 14 13  7 19

Let's say I wanted to get the cor for each row for selected columns using mutate - I tried:

df %>% 
  mutate(my_cor = cor(x = c(a1,b1,c2), y = c(a2,b2,c2)))

but this doesn't work as it uses the full column of data for each column header input.

The first row of the my_cor column of the output df from above should be the calculation:

cor(x = c(1,3,10), y = c(9,3,15))

And the next row should be:

cor(x = c(2,1,11), y = c(10,4,16))

and so on. The actual function I'm using is more complex but it does take two vector inputs like cor does so I figured this would be a good proxy.

I have a feeling I should be using purrr for this action (similar to this post) but I haven't gotten it to work.

Bonus: The actual problem I'm facing is using a function that would use many different columns so I'd like to be able select them from a a character vector like my_list_of_cols <- c("a1", "b1", "c1") (my true list is much longer).

I suspect I'd be using pmap_dbl like the post I linked to but I can't get it to work - I tried something like...

mutate(my col = pmap_dbl(select(., var = my_list_of_cols), somefunction))

(note that somefunction in the above portion takes a 2 vector inputs but one of them is static and pre-defined - you can assume the vector c(a2, b2, c2) is the static and predefined one like:

somefunction <- function(a1,b1,c1){
    a2 = 1 
    b2 = 4
    c2 = 5
    my_vec = c(a2, b2, c2)
         cor(x = (a1,b1,c1), y = my_vec)
}

)

I'm still learning how to use purrr so any help would be greatly appreciated!

jmb277
  • 558
  • 4
  • 19
  • Try `df %>% mutate(my_cor = pmap_dbl(., ~ c(...) %>% {cor(.[1:3], .[4:6])}))` – akrun Jul 31 '19 at 16:12
  • in the `pmap_dbl(., ~ c(...) %>% {cor(.[1:3], .[4:6])})` part doesn't the `.` mean select all columns? My _true_ `df` doesn't actually use all columns. Also in the `c(...)` - do I need to put the column names in as characters? – jmb277 Jul 31 '19 at 16:18
  • In that. case you can wrap in `select(., my_list_of_cols)`. But where is the second sset of columnss YOu also have 'a2', 'b2', 'c2', right as 'y' – akrun Jul 31 '19 at 16:19
  • Apologies for being dense on this - I'm still new to `purr` - that `select(., my_list_of_cols)` part goes in place of the `.` or the `c(...)`? – jmb277 Jul 31 '19 at 16:20
  • In your code, you are only passing the 'a1', 'b1', 'c1' and the 'a2', 'b2', etc are left out – akrun Jul 31 '19 at 16:21
  • I've updated the post to include the detail that `somefunction` takes two vectors but one is static and predefined. Apologies for any confusion. – jmb277 Jul 31 '19 at 16:24
  • Wihout knowing how you constructed the function, it is difficult to suggest. I would pass all the relevant columns as input to `select` itself – akrun Jul 31 '19 at 16:25
  • Updated again - assume `c(a2, b2, c2)` is static and predefined. – jmb277 Jul 31 '19 at 16:28
  • Try `df %>% mutate(my_cor = pmap_dbl(select(., my_list_of_cols, a2, b2, c2), ~ c(...) %>% {cor(.[my_list_of_cols], .[setdiff(names(.), my_list_of_cols)])}))` – akrun Jul 31 '19 at 16:31
  • replacing `my_list_of_cols` with`c("a1", "b1", "c1")` like so `df %>% mutate(my_cor = pmap_dbl(select(., c("a1", "b1", "c1") , a2, b2, c2), ~ c(...) %>% {cor(.[c("a1", "b1", "c1")], .[setdiff(names(.), c("a1", "b1", "c1"))])}))` works - thank you. I'm not really sure how it works but I guess I'll take a look into it. – jmb277 Jul 31 '19 at 16:37

1 Answers1

1

Here is one option to pass an object of column names and other names passed into select

library(tidyverse)
my_list_of_cols <- c("a1", "b1", "c1")
another_list_cols <- c("a2", "b2", "c2")

df %>% 
  mutate(my_cor = pmap_dbl(
    select(., my_list_of_cols,
           another_list_cols), ~ c(...) %>% 
      {cor(.[my_list_of_cols], .[setdiff(names(.), my_list_of_cols)])}
    ))
jmb277
  • 558
  • 4
  • 19
akrun
  • 874,273
  • 37
  • 540
  • 662