2

I want to standardize a number of columns in a dataframe, but not all columns. The columns to be manipulated are specified in a vector.

To illustrate, take the following simulated dataframe:

set.seed(1)
mydf <- data.frame(matrix(sample(100, 36, replace = TRUE), nrow = 12))

Defining the two columns to be manipulated (note that the solution should apply to a subset of columns defined by their names, not their dataframe number):

variables <- c("X1", "X2")

Now I wrote the following loop to standardize the two columns, which throws me an error.

for (i in seq_along(variables)) {
  mydf[variables[i]] <- ((mydf[variables[i]] - mean(mydf[variables[i]], na.rm = TRUE)) / sd(mydf[variables[i]], na.rm = TRUE))
}

What is the correct way to do this? (I am a beginner to R.)

broti
  • 1,338
  • 8
  • 29

3 Answers3

2

You can use scale, and you do not need a loop:

mydf[variables] <- scale(mydf[variables])
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
0

standardize feature from mlr package will help you.

set.seed(1)
mydf <- data.frame(matrix(sample(100, 36, replace = TRUE), nrow = 12))

colnames(mydf)
library(mlr)
trainTask <- normalizeFeatures(mydf[c( "X1","X2" )],method = "standardize")
Hunaidkhan
  • 1,411
  • 2
  • 11
  • 21
  • Very elegant, however, the solution needs to apply to a subset of columns that is known by column names (not their dataframe position). This was unclear from my question and I edited accordingly. – broti Oct 11 '18 at 08:51
  • updated the solution with specific column names – Hunaidkhan Oct 11 '18 at 08:56
  • Following up my previous comment, replacing `mydf[c(1,2)]` with `mydf[variables]` in your last line of code does what I want. – broti Oct 11 '18 at 09:01
  • you just have to change it to variables thats not a big deal. – Hunaidkhan Oct 11 '18 at 13:39
0

To get your loop working use [[ instead of [ because mean and sd expect a vector.

for (i in seq_along(variables)) {
  mydf[variables[i]] <-
    ((mydf[variables[i]] - mean(mydf[[variables[i]]], na.rm = TRUE)) / sd(mydf[[variables[i]]], na.rm = TRUE))
}

But consider to use scale, see @SvenHohenstein's answer.

markus
  • 25,843
  • 5
  • 39
  • 58