9

I want to calculate Z-scores using means and standard deviations generated from each group. For example I have following table. It has 3 groups of data, I can generate mean and standard deviation for each group. Then I use group 1 mean and SD to calculate the Zscores for group one data points, and etc...

> dat
   group level    y
1      1     A 10.8
2      1     B 12.0
3      1     C  9.6
4      1     A 12.0
5      1     B  7.8
6      1     C 10.8
7      2     A  8.7
8      2     B  9.2
9      2     C  8.2
10     2     A 10.0
11     2     B 12.2
12     2     C  8.2
13     3     A 10.9
14     3     B  8.3
15     3     C 10.1
16     3     A  9.9
17     3     B 10.9
18     3     C 10.3

I have learned from this blog on how to get summary data by group, but not sure how to go from there.

Thanks.

4 Answers4

14

Base R (i.e., no dependencies required) includes the functions ave() (for group wise application) and scale() (for calculating z-scores):

dat$z <- ave(dat$y, dat$group, FUN=scale)

Then the new variable z in dat will contain the groupwise-scaled variable.

Note that unlike similar functions in Base R (e.g., sapply, lapply), you need to include FUN= explicitly.

Jeromy Anglim
  • 33,939
  • 30
  • 115
  • 173
ndoogan
  • 1,925
  • 12
  • 15
  • @ Jeromy Anglim, If I have two categorical groups, how can I customize the syntax to get the z-score? like, I have list of plots, species, and numeric variables and I want to specifically normalize the numerical variable based on plots and species because many species can be found in many plots? – Stackuser Jan 25 '20 at 22:05
  • @Stackuser where I've placed `dat$group` is in place of the `...` argument, which takes any number of grouping variables. Just add more variables. `dat$z <- ave(dat$y, dat$group_1, dat$group_2, FUN=scale)`. – ndoogan Feb 03 '20 at 16:45
  • thanks, if you assist me again, how can I magnify or take part of a graph and put in another place using R so that it can be more visible for audiences? – Stackuser Feb 05 '20 at 16:55
  • thanks! If you assist me again, how can I put the coordinates of the lowest point in the graph using R, for example, if the lowest point coordinates is (17, 17.1), I want to locate this point on the graph as (17, 17.1) and how do you do that? – Stackuser Feb 05 '20 at 17:19
  • @stackuser your question is out of the scope of this answer, which addresses a very specific question that has nothing to do with plotting. I think you should ask a totally new question to get help with your plotting problem. – ndoogan Feb 06 '20 at 17:18
6

I would check out data.table for this.

Something like:

require(data.table)
datDT <- data.table(dat)
datDT[, yScaled := scale(y), by = group]
dmartin
  • 643
  • 1
  • 6
  • 14
3

You can use the ddply function of plyr and calculate the z score.

library(plyr)
dat <-  ddply(dat, .(group), summarize, z_score=scale(y))

or you can calculate it manually as -

dat <-  ddply(dat, .(group), summarize, z_score=(y-mean(y))/sd(y)))

If you have na's in your data, then add na.rm=True in the mean and sd functions.

Hope this helps.

RHelp
  • 815
  • 2
  • 8
  • 23
3

In dplyr

library(dplyr)

dat_z = dat %>%
        group_by(group) %>%
        mutate(z_score = scale(y))
dmt
  • 2,113
  • 3
  • 24
  • 23