I have not spent such a time on one single task like this for years.
There are multiple hints here on SO for example: here or here so one is tempted to say this is a duplicate (I would even say so). But with the examples and multiple trials I was not able to accomplish what's needed.
Here is full example:
x <- data.frame(idx=1:30, group=rep(letters[1:10],3), val=runif(30))
x$val[sample.int(nrow(x), 5)] <- NA; x
spl <- with(x, split(x, group))
lpp <- lapply(spl,
function(x) { r <- with(x,
data.frame(x, val_g=cut(val, seq(0,1,0.1), labels = FALSE),
val_g_lab=cut(val, seq(0,1,0.1)))); r })
rd <- do.call(rbind, lpp); ord <- rd[order(rd$idx, decreasing = FALSE), ]; ord
aggregate(val ~ group + val_g_lab, ord,
FUN=function(x) c(mean(x, na.rm = FALSE),
sum(!is.na(x))), na.action=na.pass)
The desired ouput: I would like to have also the NA's included, after aggregate()
. Currently the aggregate()
drops the NA's rows.
idx group val val_g val_g_lab
a.1 1 a 0.53789249 6 (0.5,0.6]
b.2 2 b 0.01729695 1 (0,0.1]
c.3 3 c 0.62295270 7 (0.6,0.7]
d.4 4 d 0.60291892 7 (0.6,0.7]
e.5 5 e 0.76422909 8 (0.7,0.8]
f.6 6 f 0.87433547 9 (0.8,0.9]
g.7 7 g NA NA <NA>
h.8 8 h 0.50590159 6 (0.5,0.6]
i.9 9 i 0.89084068 9 (0.8,0.9]
...... continue (full data set as @ord object.