I am new to R programming and trying to learn part time so apologize for naive coding and questions in advance. I have spent about 1 day trying to figure out code for this and unable to do so hence asking here.
https://www.kaggle.com/c/titanic/data?select=train.csv
I am working on train Titanic Data set from Kaggle imported as train_data. I have cleaned up all the col and also converted them to factor where needed.
My question is 2 fold:
1. Unable to understand why this formula gives IV values as 0 for everything. What have I done wrong?
factor_vars <- colnames(train_data)
all_iv <- data.frame(VARS=factor_vars, IV=numeric(length(factor_vars)),STRENGTH=character(length(factor_vars)),stringsAsFactors = F)
for (factor_var in factor_vars){
all_iv[all_iv$VARS == factor_var, "IV"] <-
InformationValue::IV(X=train_data[, factor_var], Y=train_data$Survived)
all_iv[all_iv$VARS == factor_var, "STRENGTH"] <-
attr(InformationValue::IV(X=train_data[, factor_var], Y=train_data$Survived), "howgood")
}
all_iv <- all_iv[order(-all_iv$IV), ]
2. I am trying to create my own function to calculate IV values for multiple columns in 1 go so that I do not have to do repetitive task however when I run the following formula I get count of total 0 and total 1 instead of items grouped by like I requested. Again, what is that I am doing wrong in this example?
train_data %>% group_by(train_data[[3]]) %>%
summarise(zero = sum(train_data[[2]]==0),
one = sum(train_data[[2]]==1))
I get output
zero one
1 549 342
2 549 342
3 549 342
where as I would anticipate an answer like:
zero one
1 80 136
2 97 87
3 372 119
what is wrong with my code?
3. Is there any pre built function which can give IV values for all columns? On searching I found iv.mult function but I can not get it to work. Any suggestion would be great.