For the past week I have been trying to aggregate my dataset that consists of different weight measurements in different months accompanied by a large volume of background variables in R.
I have read many different asked questions on this topic (i.e. R aggregate data by defining grouping, How to aggregate count of unique values of categorical variables in R), but they all seem to either only work with one type of data or are only interested in one column. Specifically, question Recoding categorical variables to the most common value deals with almost exactly the same problem, but the proposed answer only fixes the problem for the categorical data, it does not include the numeric data as well. My data consist of both factors(categorical and ordinal) and numeric data.
The reproducible example is:
IDnumber <- c("1", "1", "1", "2", "2", "3", "3", "3")
Gender <- c("Male", "Male", "Male", "Female", "Female", "Female", "Female", "Female")
Weight <- c(80, 82, 82, 70, 66, 54, 50, 52)
LikesSoda <- c("Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", NA)
df = data.frame(IDnumber, Gender, Weight, LikesSoda)
My output dataframe would take the mean of each numerical column, and the most frequent factor for each factor column. In the example this would look as following:
IDnumber <- c("1", "2", "3")
Gender <- c("Male", "Female", "Female")
Weight <- c(81.5, 78, 52)
LikesSoda <- c("No", "Yes", "Yes")
output = data.frame(IDnumber, Gender, Weight, LikesSoda)
So far I've tried to split the dataframe into a factor dataframe and numeric dataframe and use two aggregates with a different function (mean for the numeric, but I've not been able to find a working function for the categorical data). The other option is to use a dplyr df &>& group_by(IDnumber) %>% summarise( transformation for each variable )
code, but that requires me to specify how to handle each column manually. Since I have over 2500 columns, this does not seem like a workable solution.