I have a dataset "rf1" of 845 features and 1052 rows and want to eliminate, in order to do ML, the highly correlated features. I made this code but it shows me features and correlations without eliminate them...
`corr_simple<-function(rf1,sig=0.9)
{df_cor <- rf1 %>% mutate_if(is.character, as.factor)
df_cor <- df_cor %>% mutate_if(is.factor, as.numeric)
corr<-cor(df_cor)
corr[lower.tri(corr,diag=TRUE)] <- NA
corr[corr == 1] <- NA
corr <- as.data.frame(as.table(corr))
corr <- na.omit(corr)
corr <- subset(corr, abs(Freq) > sig)
corr <- corr[order(-abs(corr$Freq)),]
print(corr)
mtx_corr <- reshape2::acast(corr, Var1~Var2,value.var="Freq")}
corr_simple(rf1)`
here is the result but I want to eliminate the variables with a threshold of 0.9 MY RESULTS
When I use functions found here like this one I have an error message like this :
`data<-data.frame(rf1)
cor_matrix <- cor(data)
cor_matrix_rm <- cor_matrix
cor_matrix_rm[upper.tri(cor_matrix_rm)] <- 0
diag(cor_matrix_rm) <- 0
cor_matrix_rm
data_new <- data[ , !apply(cor_matrix_rm, 2, function(x) any(x > 0.90))]
Error in [.data.frame(data, , !apply(cor_matrix_rm, 2, function(x) any(x > :
undefined columns selected`
I searched and tried other solutions but always this problem...