I have this dataframe:
set.seed(42) ## for sake of reproducibility
df <- data.frame("time"=c(1:40), "Var1"=sample(1:100, size=40),
"Var2"=sample(1:100, size=40), "Var3"=sample(1:100, size=40))
head(df)
# time Var1 Var2 Var3
# 1 1 49 3 38
# 2 2 65 21 1
# 3 3 25 2 13
# 4 4 74 58 78
# 5 5 18 10 5
# 6 6 100 40 73
As I want to get the quantiles per column, I have this code:
(quantiles <- as.data.frame(apply(df[2:4] , 2 , quantile, probs=seq(0, 1, 1/10), na.rm=TRUE)))
# Var1 Var2 Var3
# 0% 2.0 2.0 1.0
# 10% 5.9 8.9 11.4
# 20% 19.6 17.6 15.8
# 30% 25.7 31.1 28.1
# 40% 35.2 41.2 35.8
# 50% 42.5 51.0 42.5
# 60% 53.2 57.4 56.4
# 70% 67.3 70.2 66.0
# 80% 80.8 80.4 78.6
# 90% 89.4 90.5 90.1
# 100% 100.0 99.0 100.0
My objective is to add into my original dataframe (df
) a column with each of the quantiles per variable. In order to achieve that, I have this code:
df$QuantVar1 <- .bincode(x=df$Var1, breaks=quantiles$Var1, include.lowest=T, right=T)
df$QuantVar2 <- .bincode(x=df$Var2, breaks=quantiles$Var2, include.lowest=T, right=T)
df$QuantVar3 <- .bincode(x=df$Var3, breaks=quantiles$Var3, include.lowest=T, right=T)
head(df)
# time Var1 Var2 Var3 QuantVar1 QuantVar2 QuantVar3
# 1 1 49 3 38 6 1 5
# 2 2 65 21 1 7 3 1
# 3 3 25 2 13 3 1 2
# 4 4 74 58 78 8 7 8
# 5 5 18 10 5 2 2 1
# 6 6 100 40 73 10 4 8
(Note that I use .bincode
because I didn't have unique breaks and I found this solution).
As I want each new column next to the original variable, I relocate each of them manually:
library(dplyr); library(tidyft)
df <- df %>%
relocate(QuantVar1, .after = Var1)
df <- df %>%
relocate(QuantVar2, .after = Var2)
df <- df %>%
relocate(QuantVar3, .after = Var3)
head(df)
# time Var1 QuantVar1 Var2 QuantVar2 Var3 QuantVar3
# 1 1 49 6 3 1 38 5
# 2 2 65 7 21 3 1 1
# 3 3 25 3 2 1 13 2
# 4 4 74 8 58 7 78 8
# 5 5 18 2 10 2 5 1
# 6 6 100 10 40 4 73 8
The code works perfectly. But... what if I have 100 variables or more? I cannot do the process 100 times or even more.
I want to avoid using loops and I have been trying to use the lapply
family.
I have already seen how to add new columns with lapply
in this post but I don't know if there is a way to add the new column next to the column that it is using as I have in the above example.
Does anybody have an idea about how to do it?