I'm new to R and Stack Overflow, so probably my question makes a lot of mistakes, sorry in advance.
I'm using caret's cor()
function, and it took me an hour to fix a small problem, but I still don't understand what's wrong. Basically I have a data.frame
, and I want to flag numeric variables that are highly correlated. So I create a subset of the numeric variables, except for SalePrice
, which has NA
s in the test set:
numericCols <- which(sapply(full[,!(names(full) %in% 'SalePrice')], is.numeric))
Then
cor(full[,numericCols])
gives an error:
Error in cor(full[, numericCols]) : 'x' must be numeric.
Except when I do it this way:
numericCols2 <- which(sapply(full, is.numeric))
numericCols2 <- numericCols2[-31] #dropping SalePrice manually
it works just fine.
When I do numericCols == numericCols2
the output is:
LotFrontage
TRUE
LotArea
TRUE
# .
# . All true
# .
HouseAge
FALSE
isNew
FALSE
Remodeled
FALSE
BsmtFinSF
FALSE
PorchSF
FALSE
All the ones that are false are variables I've created myself, for example HouseAge
:
full$HouseAge <- full$YrSold - full$YearBuilt
Why is this happening?