I am preparing a dataset for a PCA, all my variables are numeric so I can calculate the median of all of them.
I have two grouping variables. I need to calculate the median of the group (say first group is CATEGORIA=6 and Dpto='A' and so on) and use this value as a replacement for the cells with NA on them, my code is:
for (j in 10:46){
consolidado1<-consolidado%>%
group_by(CATEGORIA,Dpto,.add=T)%>%
mutate_at(vars(j),~ ifelse(is.na(.),median(consolidado[,j],na.rm=T), .))
}
However it's not replacing anything and whenever I try to test some values of j, for example:
consolidado1<-consolidado%>%
group_by(CATEGORIA,Dpto,.add=T)%>%
mutate_at(vars(11),~ ifelse(is.na(.),median(consolidado[,11],na.rm=T), .))
The NAs are replaced not with the group median but with the median of the whole column.
What's the correct way of doing this? How do I properly extract the group median?