I have a large dataset (about 12,000 columns) that looks like this
> df
ID Group val1 val2 val3
1 01 a 3 3 3
2 02 a 4 4 4
3 03 b 6 6 7
4 04 c 10 10 19
5 05 b 2 2 2
6 06 b 4 4 4
7 07 c 8 8 8
8 08 c 12 12 12
loop through each column and get an IQR for each Group.
Then calculate for each column per group a deltaIQR...
For example
delta IQR of B = ( IQR of group B - IQR of Group A) / IQR of Group A
delta IQR of C = (IQR of group C - IQR of Group A) / IQR of Group A
What is the most efficient way to do this?
I attempted a dplyr summarise by Group solution but the df is too big. And also I need to calculate quantiles first, etc. So it gets more unwieldy...
Using the dplyr solution before brings in some errors
df %>%
group_by(Group) %>%
summarise_at(vars(matches('val')), IQR) %>%
rename_at(-1, ~ paste0(., "_IQR")) %>%
mutate_at(vars(matches('val')), list(delta= ~ (. - .[1])/.[1]))
In my actual dataset
> temp
v6599_IQR v6599_IQR_delta v1554_IQR v1554_IQR_delta
1 0.00191803 0.000000e+00 0.001794153 0.000000e+00
2 0.62698976 3.258926e+02 1.722508234 9.590677e+02
3 0.00191803 7.235440e-15 0.001794153 4.641005e-14
4 0.00191803 -3.617720e-14 2.155928869 1.200642e+03
Now there seems to be an error, because when I calculate the deltaIQR for 3 and 4... the calculation is off, for the first column, delta IQR for rows 3 and 4 should be 0.