Given the example data set below:
df <- as.data.frame(matrix( c(1, 2, 3, NA, 5, NA,
7, NA, 9, 10, NA, NA), nrow=2, ncol=6))
names(df) <- c( "varA", "varB", "varC", "varD", "varE", "varF")
print(df)
varA varB varC varD varE varF
1 1 3 5 7 9 NA
2 2 NA NA NA 10 NA
I'd like to be able to use kmeans(...) on data sets without having to manually check or delete variables that contain NA anywhere within the variable. While I'm asking right now for kmeans(...) I'll be using a similar process for other things, so a kmeans(...) specific answer won't totally answer my question.
The manual version of what I'd like is:
kmeans_model <- kmeans(df[, -c(2:4, 6)], 10)
And the pseudo-code would be:
kmeans_model <- kmeans(df[, -c(colnames(is.na(df)))], 10)
Also, I don't want to delete the data from df. Thanks in advance.
(Obviously kmeans(...) wouldn't work on this example data set but I can't recreate the real data set)