(I do understand that my question is equivalent to this one : R function equivalent to proc summary in SAS But being a new user, I can't comment on the solutions to ask details or explanations and I can't get any of them to work.)
I'm trying to convert a script from SAS to R. The objective is to get a wide summary of a database across multiple variables.
The starting base is like this :
Student ID | Flag1 | Flag2 | Flag3 | other flags... | weight | score |
---|---|---|---|---|---|---|
code1 | level1 | A | first | smth~~ | 2 | 12 |
code23 | level5 | C | third | smth~else~ | 3 | 9 |
And in the end I want something like this :
Flag1 | Flag2 | Flag3 | other flags... | nb of students | weighted mean | std dev | min | 1st quartile | ... | max | nb of students in fist decile | ... | nb of students in last decile |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
level1 | A | first | smth~~ | 5 | 10.96 | 1.5 | 1 | ... | ... | ... | ... | ... | ... |
level5 | .All | third | smth~else~ | 1500 | 8.70 | 2.7 | 3 | ... | ... | ... | ... | ... | ... |
In SAS it was really easy because proc summary does the summary for each combination of grouping possible, but in R, you only get the lowest level of grouping. With 9 different levels of grouping that's 512 combinations and I think there should be a way to loop some of the work.
Here's how I think I should proceed :
1- List all the different combinations in a dataframe :
Flag1 | Flag2 | Flag3 |
---|---|---|
.All | .All | .All |
.All | .All | first |
.All | .All | second |
.All | A | .All |
.All | B | .All |
LV1 | .All | .All |
LV2 | .All | .All |
.All | A | first |
.All | A | second |
.All | B | first |
.All | B | second |
LV1 | .All | first |
LV1 | .All | second |
LV2 | .All | first |
LV2 | .All | second |
LV1 | A | .All |
LV1 | B | .All |
LV2 | A | .All |
LV2 | B | .All |
LV1 | A | first |
LV1 | A | second |
LV1 | B | first |
LV1 | B | second |
LV2 | A | first |
LV2 | A | second |
LV2 | B | first |
LV2 | B | second |
2- Make a 2^n length loop that will call the following function :
3- The function would take a line from the last dataframe and then output a dataframe that would contain the summary grouping by some variables + columns with .All for the variables not used for grouping
4- stack each iteration of the loop on each other using bind_rows