1

I have a dataframe with following variables

dat <- data.frame(cell.ID = 1:10, cell.name = letters[1:10], 
              groupID = rep(1:2, each = 5), 
              x1 = rnorm(10), x2 = rnorm(10), 
              x3 = rnorm(10), x4= rnorm(10), 
              x5 = rnorm(10), x6 = rnorm(10))

I have a mean and sd stored to normalise x1 to x6 in another dataframe

norm_fin <- data.frame(variable = paste0('x',1:6), 
                   meanVar = rnorm(6),
                   SdVar = rnorm(6))

I want to create a new dataframe from dat after normalised x1 to x6. I did a loop solution

 varVec <- paste0('x',1:6)
 dat1 <- dat
 for(i in varVec){
  meanRef <- norm_fin$meanVar[norm_fin$variable == i]
  sdRef <- norm_fin$SdVar[norm_fin$variable == i]
  dat1[, i] <- (dat[, i] - meanRef)/sdRef
}

Is there another solution without using a loop?

89_Simple
  • 3,393
  • 3
  • 39
  • 94
  • I am a bit confused what exactly you try to do. Why do you store the mean and sd? Why wouldn't an inbuild function like scale do the work? Or why not just calculating the mean (or sd) of the variable in your mutation? Like they do here: https://stackoverflow.com/questions/15215457/standardize-data-columns-in-r – Annet Jan 30 '20 at 12:51
  • the mean and sd were derived from a different dataset of x1 to x6 and I want to use those mean and sd to normalise the new x1 to x6 in the new data `dat` – 89_Simple Jan 30 '20 at 12:52
  • 1
    Why not simply join it so it is in the same row and than use mutate? Edit: oh well I see that is someone else's answer indeed – Annet Jan 30 '20 at 12:53

1 Answers1

2

We can convert the data into long format and then left join norm_find, calculate the value and then get the data back in wide format.

library(dplyr)
library(tidyr)

dat %>%
  pivot_longer(cols = starts_with('x')) %>%
  left_join(norm_fin, by = c('name' = 'variable')) %>%
  mutate(val = (value - meanVar)/SdVar) %>%
  select(-value, -meanVar, -SdVar) %>%
  pivot_wider(names_from = name, values_from = val)

# A tibble: 10 x 9
#   cell.ID cell.name groupID     x1      x2     x3    x4     x5      x6
#     <int> <fct>       <int>  <dbl>   <dbl>  <dbl> <dbl>  <dbl>   <dbl>
# 1       1 a               1 -2.10   32.6   -0.797 0.705 -0.768  0.0217
# 2       2 b               1 -1.36   16.3    0.125 0.353 -1.76   0.144 
# 3       3 c               1  2.63   17.0   -0.751 0.933  0.394  0.150 
# 4       4 d               1 -0.690  11.6   -0.429 0.925 -6.60  -0.461 
# 5       5 e               1 -0.559  -1.01  -0.316 0.898 -4.64   0.229 
# 6       6 f               2  2.98   43.2   -1.47  0.833  0.105 -0.525 
# 7       7 g               2  0.181  18.9    1.27  0.767 -1.36   0.802 
# 8       8 h               2 -3.67  -27.6    0.528 0.467 -1.23  -0.122 
# 9       9 i               2 -2.38   22.7   -0.873 0.348 -3.77   0.0778
#10      10 j               2 -1.84    0.557  1.72  0.311 -2.01   0.0379

data

set.seed(123)
dat <- data.frame(cell.ID = 1:10, cell.name = letters[1:10], 
                  groupID = rep(1:2, each = 5), 
                  x1 = rnorm(10), x2 = rnorm(10), 
                  x3 = rnorm(10), x4= rnorm(10), 
                  x5 = rnorm(10), x6 = rnorm(10))

norm_fin <- data.frame(variable = paste0('x',1:6), 
                       meanVar = rnorm(6),
                       SdVar = rnorm(6))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213