If I pass the variable bloodpressure
to data.table, everything works fine.
tdt <- data.table(bloodpressure = rnorm(1000, mean=100, sd=15), male=rep(c(0,1)))
strata.var <- with(tdt, get(c('male')))
tdt[,list(
varname='bloodpressure',
N=.N,
mean=mean(bloodpressure, na.rm=TRUE),
sd=sd(bloodpressure, na.rm=TRUE)
),
by=(strata.var)]
I get this result
strata.var varname N mean sd
1: 0 bloodpressure 500 100.2821 15.13686
2: 1 bloodpressure 500 100.0392 15.02566
Which matches the group means
> mean(tdt$bloodpressure[tdt$male==0])
[1] 100.2821
> mean(tdt$bloodpressure[tdt$male==1])
[1] 100.0392
But if I am trying to do this programmatically, and the variable is stored in another variable (var
)
var_as_string <- 'bloodpressure'
var <- with(tdt, get(var_as_string))
tdt[,list(
varname='bloodpressure',
N=.N,
mean=mean(var, na.rm=TRUE),
sd=sd(bloodpressure, na.rm=TRUE)
),
by=(strata.var)]
I get a different result.
strata.var varname N mean sd
1: 0 bloodpressure 500 100.1606 15.13686
2: 1 bloodpressure 500 100.1606 15.02566
Notice now mean
is identical (i.e. calculated across the whole sample not by group.
> mean(tdt$bloodpressure)
[1] 100.1606