This is a very simplified version of my actual problem.
My real df
has many columns and I need to perform this action using a select
from a character vector of column names.
library(tidyverse)
df <- data.frame(a1 = c(1:5),
b1 = c(3,1,3,4,6),
c1 = c(10:14),
a2 = c(9:13),
b2 = c(3:7),
c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1 1 3 10 9 3 15
2 2 1 11 10 4 16
3 3 3 12 11 5 17
4 4 4 13 12 6 18
5 5 6 14 13 7 19
Let's say I wanted to get the cor
for each row for selected columns using mutate
- I tried:
df %>%
mutate(my_cor = cor(x = c(a1,b1,c2), y = c(a2,b2,c2)))
but this doesn't work as it uses the full column of data for each column header input.
The first row of the my_cor
column of the output df
from above should be the calculation:
cor(x = c(1,3,10), y = c(9,3,15))
And the next row should be:
cor(x = c(2,1,11), y = c(10,4,16))
and so on. The actual function I'm using is more complex but it does take two vector inputs like cor
does so I figured this would be a good proxy.
I have a feeling I should be using purrr
for this action (similar to this post) but I haven't gotten it to work.
Bonus: The actual problem I'm facing is using a function that would use many different columns so I'd like to be able select
them from a a character vector like my_list_of_cols <- c("a1", "b1", "c1")
(my true list is much longer).
I suspect I'd be using pmap_dbl
like the post I linked to but I can't get it to work - I tried something like...
mutate(my col = pmap_dbl(select(., var = my_list_of_cols), somefunction))
(note that somefunction
in the above portion takes a 2 vector inputs but one of them is static and pre-defined - you can assume the vector c(a2, b2, c2)
is the static and predefined one like:
somefunction <- function(a1,b1,c1){
a2 = 1
b2 = 4
c2 = 5
my_vec = c(a2, b2, c2)
cor(x = (a1,b1,c1), y = my_vec)
}
)
I'm still learning how to use purrr
so any help would be greatly appreciated!