Hey awesome community,
I am trying to learn how to use loops to loop through aspects of a dataset. I'm using the sns data set provided free for machine learning and trying to run a k means cluster analysis. The first thing I need to do is to center and scale the variables. I want to do this using a loop, and I need to select all but the first four variables in the data set. Here's what I tried, and I'm not sure why this doesn't work:
for(i in names(sns.nona[, -c(1:4)])){
scale(i, center = TRUE, scale = TRUE)
}
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
I get the above error, which must mean it's not selecting the actual column of the data set, just the name. I guess I should expect that, but how do I make it reference the data?
edit: I also tried:
for(i in names(sns.nona)[-c(1:4)]){
scale(sns.nona[,i], center = TRUE, scale = TRUE)
}
This did not return an error but it does not appear to be centering the data. I should get some negative values if the original value was 0 as I'd be subtractign the column mean from it...