I am calculating group means with data.table as follows:
library(data.table)
DF <- fread(
"A B D value iso year
0 1 1 NA ECU 2009
1 0 2 1 ECU 2009
1 0 1 2 ECU 2009
0 0 3 1 BRA 2011
1 0 4 0 BRA 2011
0 0 3 1 BRA 2011
0 1 7 NA ECU 2008
1 0 1 1 ECU 2008
1 0 1 1 ECU 2008
0 0 3 2 BRA 2012
0 0 3 2 BRA 2012
1 0 4 NA BRA 2012",
header = TRUE
)
setDT(DF)[,mean_value := mean(value, na.rm=TRUE), by=c("iso", "year")]
In order to exclude the current observation from the calculation,
I tried incorporating (sum(value) - value)/(n()-1)
from this answer, and do:
setDT(DF)[,mean_value := (sum(value, na.rm=TRUE)-value)/(.N-1), by=c("iso", "year")]
But I am worried about .N-1
setDT(DF)[,n_val:= (.N-1), by=c("iso", "year")]
It always gives a value of 2, where it should give a value of one when there is only one observation not NA. I tried to go with:
setDT(DF)[,mean_value := (sum(value, na.rm=TRUE)-value)/(colSums(!is.na(value))-1), by=c("iso", "year")]
But that gives:
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
and
setDT(DF)[,mean_value := (sum(value, na.rm=TRUE)-value)/(.N[!is.na(value)]-1), by=c("iso", "year")]
But that leaves too little observations. What is the missing piece of the puzzle here?