2

I'm attempting to add a column to a data frame that consists of normalized values by a factor.

For example:

'data.frame':   261 obs. of  3 variables:
 $ Area   : Factor w/ 29 levels "Antrim","Ards",..: 1 1 1 1 1 1 1 1 1 2 ...
 $ Year   : Factor w/ 9 levels "2002","2003",..: 1 2 3 4 5 6 7 8 9 1 ...
 $ Arrests: int  18 54 47 70 62 85 96 123 99 38 ... 

I'd like to add a column that are the Arrests values normalized in groups by Area.

The best I've come up with is:

data$Arrests.norm <- unlist(unname(by(data$Arrests,data$Area,function(x){ scale(x)[,1] } )))

This command processes but the data is scrambled, ie, the normalized values don't match to the correct Areas in the data frame.

Appreciate your tips.

EDIT:Just to clarify what I mean by scrambled data, subsetting the data frame after my code I get output like the following, where the normalized values clearly belong to another factor group.

      Area Year Arrests Arrests.norm
199 Larne 2002      92 -0.992843957
200 Larne 2003     124 -0.404975825
201 Larne 2004      89 -1.169204397
202 Larne 2005      94 -0.581336264
203 Larne 2006      98 -0.228615385
204 Larne 2007       8  0.006531868
205 Larne 2008      31  0.418039561
206 Larne 2009      25  0.947120880
207 Larne 2010      22  2.005283518
JMcClure
  • 701
  • 1
  • 8
  • 16

2 Answers2

3

Following up your by attempt:

df <- data.frame(A = factor(rep(c("a", "b"), each = 4)),
                 B = sample(1:4, 8, TRUE))

ll <- by(data = df, df$A, function(x){
  x$B_scale <- scale(x$B)
  x
  }
   )

df2 <- do.call(rbind, ll)
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • df2 is a double matrix? When I assign it to the data frame, the same mismatch as in my edit applies.. – JMcClure Oct 13 '13 at 23:49
  • No, df2 corresponds to your final data frame. No need to 'assign' it. Run `str(df2)`. Because you didn't provide a [minimal, reproducible data set](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610), I made up a small example. – Henrik Oct 13 '13 at 23:57
  • Right, missed a wayward 'x' in the function. Thanks much. – JMcClure Oct 14 '13 at 00:41
2
data <- transform(data, Arrests.norm = ave(Arrests, Area, FUN = scale))

will do the trick.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • This produces the same problem I had in that the data is mixed. For example `subset(data,data$Area =="Larne")` produces mismatched data such as `Newtownabbey1 Larne 2002 92 -0.992843957` – JMcClure Oct 13 '13 at 23:09
  • @JonMac Right, I modified the answer. Now, the order is correct. – Sven Hohenstein Oct 14 '13 at 06:17