I am trying to create some groups based on the percent rank of some values in dplyr
.
The code below creates a data frame and then sapply
a function to determine the groups. The downside is that I can't get sapply to work for tbl_postgres
, only data frames. So I'm curious if there is another solution for this.
I had considered something with ntile, but the groups I want to create have some arbitrary cut-offs. Also, I have not had much luck getting it to work with dplyr
(maybe pure sql might work).
library(dplyr)
n <- 100
df1 <- data.frame(idx = 1:n, x = rnorm(n))
df1 <- df1 %>%
arrange(x) %>%
mutate(pc_x = percent_rank(x))
index <- function(x) {
if (x < 0) {
return(NA)
} else if (x < 0.3) {
return(1)
} else if (x < 0.7) {
return(2)
} else if (x <= 1) {
return(3)
} else {
return(NA)
}
}
df1 <- df1 %>%
mutate(group = sapply(pc_x, index))