First, this is a "duplicate" of Aggregating mixed data by factor column
I raise the question again because the answer does not work when there are multiple id variables in the dataset. I want to aggregate a dataset, if the variable is a factor, then show the mostly appeared factor value, if the variable is numeric, then show the average. For example: (drawing from answer by David_B)
set.seed(1)
df <- data.frame(factor=as.factor(sample(1:3,1000,T)),not.factor=rnorm(1000),id1=as.factor(rep(1:10,100)),id2=as.factor(rep(1:10,each=100)))
getmode <- function(v) {
levels(v)[which.max(table(v))]
}
ag <- function(x, id, ...){
if (is.numeric(x)) {
return(tapply(x, id, mean))
}
if (is.factor(x)) {
return(tapply(x, id, getmode))
}
}
Then the following code will work
df2 <- data.frame(lapply(df, ag, id = df$id2))
But not when I have multiple id variables:
df2 <- data.frame(lapply(df, ag, id = cbind(df$id1,df$id2)))
The following error will popup:
Error in tapply(x, id, getmode) : arguments must have same length