0

Fairly new R user here, I'm currently working through Dataquest.io's Data Analyst in R course, specifically working on the NYC School Perception Guided Project. I am trying to write a custom function that subsets two columns into a tibble and sorts each tibble in descending order by correlation coefficients. The function subsets the intended columns no problem but it never sorts the tibbles by descending correlation coefficients. I'd really like to have each tibble sorted in descending order so I can see which correlation pairs have the highest coefficients in the top rows.

You can download the correlation data I have so far on my GitHub page.

I wrote a custom function to subset and arrange tibbles in descending order, which I then mapped to two character vectors I created, that finally generates a list of 16 tibbles:

cor_func <- function(x,y) {
  cor_select %>% 
    dplyr::select(x,y) %>%
    dplyr::arrange(desc(y))
}

x_var <- names(cor_select)[1]
y_var <- names(cor_select)[c(6, 11, 13:26)]

cor_rank <- map2(x_var, y_var, cor_func)

When I index one of the tibbles from the resulting list, say cor_rank[1], I get a tibble that isn't sorted:

[[1]]
# A tibble: 16 x 2
   variable   avg_sat_score
   <chr>              <dbl>
 1 saf_p_11          0.113 
 2 com_p_11         -0.0909
 3 eng_p_11          0.0314
 4 aca_p_11          0.0330
 5 saf_t_11          0.303 
 6 com_t_11          0.0937
 7 eng_t_11          0.0488
 8 aca_t_11          0.137 
 9 saf_s_11          0.272 
10 com_s_11          0.163 
11 eng_s_11          0.167 
12 aca_s_11          0.286 
13 saf_tot_11        0.280 
14 com_tot_11        0.0881
15 eng_tot_11        0.0956
16 aca_tot_11        0.177 

I've tried adding group_by to the function, taking away dplyr::, and a couple of other troubleshooting options that yielded either no results or errors. I'm open to other solutions involving different packages or creating different objects entirely out of the data. Also this is my second ever question to stack overflow that I posted after hours of not finding an exact solution, so apologies if there are any reproduceability issues.

1 Answers1

0

When using variables in a function you have to deal with non-standard evaluation (NSE) because x_var and y_var are character values and R doesn't know that those should be treated as columns.

Besides since x_var has only one value you don't need to iterate over it in map2, instead you can pass it as a constant argument. Try :

library(dplyr)
library(rlang)
library(purrr)

cor_func <- function(data, x, y) {
  data %>% 
    dplyr::select(x,y) %>%
    dplyr::arrange(desc(!!sym(y)))
}

cor_rank <- map(y_var, cor_func, data = cor_select, x = x_var)
cor_rank[[1]]

#    X SAT.Writing.Avg..Score
#1   5          0.28334515007
#2  12          0.27857524686
#3  13          0.26180042755
#4   9          0.24290539391
#5  16          0.17148497238
#6  10          0.15362911663
#7  11          0.14843858472
#8   8          0.12535108907
#9   1          0.11940609104
#10  6          0.09354663114
#11 14          0.09043009320
#12 15          0.08937188222
#13  3          0.04655940169
#14  7          0.04275641551
#15  4          0.04050495122
#16  2         -0.07174707738
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213