1

I implemented the below code to normalize my data frame in R. But I am getting the below error

for(i in 1:56){
  clean_data[(clean_data[,i]),i] <-(clean_data[,i] - min(clean_data[,i])) / (max(clean_data[,i]) - min(clean_data[,i]))
}

Error

Error: cannot allocate vector of size 9.9 Gb

Is there any other way to implement the normalization? can anyone help me?

vinay karagod
  • 256
  • 1
  • 3
  • 18

2 Answers2

1

You could use this snippet as an improvement to your code

max_col <- apply(clean_data[,-1], 2, max)
min_col <- apply(clean_data[,-1], 2, min)
clean_data_scaled <- as.data.frame(scale(clean_data[,-1], center = min_col, scale = max_col - min_col))

OR

clean_data_scaled <- scale(clean_data[, -1])
Prem
  • 11,775
  • 1
  • 19
  • 33
  • 1
    These are better ways, to accomplish the task....for sure! @vinyay, be sure you understand why this code works and so it becomes a real tool. Or ask Prem to explain what the steps do. – sconfluentus Jul 08 '17 at 20:42
0

There is just some wonkiness to how you are asking it to do what you want it to do, if I understand your interpretation right:

You are asking the loop to save into each row of each column, the value of that row and column minus the minimum of that whole column (i) which is divided by the difference between the min and max for that column.

to do this, you should let it look at all the rows as well, working down each row(j) in each column(i) before moving on the the next column.

This is not how I would normalize my data personally, but to build on what you seem to be doing the way that you are doing it, this would be the way add a second loop and iterator:

for(i in 1:56){
  for(j in 1:nrow(clean_data)){
  clean_data[j ,i] <-(clean_data[j,i] - min(clean_data[,i])) / (max(clean_data[,i]) - min(clean_data[,i]))
  }
 }

this takes the value from the current row, column combinations and adjusts it using your formula for the full column.

sconfluentus
  • 4,693
  • 1
  • 21
  • 40