0

I have a data frame in R that looks like this

head(df)

Scoresheet.Id Entry.Number  Round   Judge.Name Judge.Initials Raw.Score
1        264372           608 2 Allen              ag        79
2        266552          2493 2 Allen              ag        67
3        265218          1996 1 Allen              ag        65
4        266554          2751 2 Allen              ag        64
5        266551          2399 2 Allen              ag        63
6        262825           113 1 Allen              ag        62

Obviously there's many more judges.

I'm trying to create a new column in the dataframe with a Z-score. I'm able to calculate the Z-score based on each judge's raw scores using.

with(df, tapply(as.numeric(df$Raw.Score), df$Judge.Name, scale))

That yields an array.

How can I put the resulting Z-scores in a new column in the dataframe?

agf1997
  • 2,668
  • 4
  • 21
  • 36

2 Answers2

2

Easy to do with data.table, avoiding the tapply completely.

library(data.table)

setDT(df)
df[, Zscore := scale(Raw.Score), by = Judge.Name]

   Scoresheet.Id Entry.Number Round Judge.Name Judge.Initials Raw.Score      Zscore
1:        264372          608     2      Allen             ag        79  1.96320316
2:        266552         2493     2      Allen             ag        67  0.05305954
3:        265218         1996     1      Allen             ag        65 -0.26529772
4:        266554         2751     2      Allen             ag        64 -0.42447636
5:        266551         2399     2      Allen             ag        63 -0.58365499
6:        262825          113     1      Allen             ag        62 -0.74283363

If you're trying to avoid adding package dependencies, try aggregate:

df <- as.data.frame(df)
df$Zscore <- unlist(aggregate(Raw.Score ~ Judge.Name, df, FUN = "scale"))[-1]
Eric Watt
  • 3,180
  • 9
  • 21
0
z_scores <- with(df, tapply(as.numeric(df$Raw.Score), df$Judge.Name, scale))

Then just bind the Z scores to the df:

cbind(df, z_scores[[1]][, 1])

And voila.

Odysseus210
  • 468
  • 3
  • 9
  • This doesn't seem to work. The length of z_scores is 85 which is equal to the number to judges. The cbind step returns `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 11764, 85` – agf1997 Jul 10 '17 at 21:52
  • My bad, agf1997. I was actually getting a slightly different error message than you though, interestingly. Made an edit that should do the trick. :) – Odysseus210 Jul 10 '17 at 22:09
  • Now it's `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 11764, 24` – agf1997 Jul 10 '17 at 22:14
  • Interesting. I'm not getting that message. :/ – Odysseus210 Jul 10 '17 at 22:15
  • Does it require the same number scores per judge? Each judge has a different number of scores. – agf1997 Jul 10 '17 at 22:17
  • I'm getting the exact same result as Eric Watt. And just using the exact same with() function as the OP. – Odysseus210 Jul 10 '17 at 22:20