0

I have a dataset that looks something like this: but with hundred of variables

set.seed(123)
df <- data.frame(id= c(1,1,1,2,2,2,3,3,3), time=c(1,2,3,1,2,3,1,2,3),y = rnorm(9), x1 = rnorm(9), x2 = c(0,0,0,0,1,0,1,1,1), x3 = rnorm(9), c1 = rnorm(9),  c2 = rnorm(9))

I would like to standardize all my variables to ease interpretation after regression. I know I could standardize variable one by one using BBmisc

library(BBmisc)
df$z_y <- normalize(df$y, method = "standardize")

But this would result quite tedious long and disorganized in the command file.

Since I am not really able to use loops or functions, I was wondering whether someone would know how to do it in a single (few) lines. Potentially selecting the relevant variables to standardize.

Also, it would be good if the function was able to detect dummies (x2) and avoid standardizing those

I thank you in advance for your help

Alex
  • 1,207
  • 9
  • 25
  • 1
    It's already vectorized, try `normalize(df[3:8], method="standardize")`. – jay.sf Oct 21 '19 at 16:58
  • 1
    Also, if it's just a z-score, then you don't really need a specific package. You can do `data.frame(lapply(df[3:8], function(x) (x - mean(x))/sd(x)))`. – tmfmnk Oct 21 '19 at 16:59
  • 1
    why not use base R function `scale`?? ie `scale(df[-(1:2)])` – Onyambu Oct 21 '19 at 16:59
  • Is there a way that I can use so that R automatically detects dummies or character strings and avoid standardizing those? – Alex Oct 21 '19 at 17:04
  • Yeah, supposed only your variables to be normalized are numeric (do `df[1:2] <- lapply(df[1:2], as.factor)` for testing), you could do `scale(df[lapply(df, class) == "numeric"])`. – jay.sf Oct 21 '19 at 17:30
  • Possible duplicate of [Standardize data columns in R](https://stackoverflow.com/questions/15215457/standardize-data-columns-in-r) – M-- Oct 21 '19 at 17:43
  • or with `dplyr` you can do `mutate_if(df, is.numeric, scale)` –  Oct 21 '19 at 17:57

0 Answers0