2

I'm looking to normalize a variable in a data.table by subtracting the mean within each group. I have done it in the following manner:

dx <- data.table(x=c(1,3,5,1,8,11),group=factor(c(1,1,1,2,2,2)))
dy <- dx[,.(xmean=mean(x)),by=.(group)]
setkey(dx,group)
setkey(dy,group)
dx[dy,x_norm:=x-xmean]

I'm wondering if there is a more concise way of doing this?

user2506086
  • 503
  • 2
  • 10
  • 1
    By the way, your way should be fairly efficient, since when you use `mean` on it's own, there is some special optimization as described here: http://stackoverflow.com/q/22137591 You could do `dx[, xm := mean(x), by=group][, \`:=\`(x_norm = x-xm, xm = NULL)]` – Frank Apr 06 '16 at 03:37

1 Answers1

9

You can use the scale function to do this:

dx[, x_norm := scale(x, center = TRUE, scale = FALSE), by = group]

This is equivalent to @Hadd E. Nuff's way of:

dx[, x_norm := x - mean(x), by = group]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Chris
  • 6,302
  • 1
  • 27
  • 54