I have just started using mlr3 and still very unfamiliar with the syntax, I have two questions:
- How can I access the coefficient from the trained Logistic Regression in mlr3?
- I am dealing with a extremely imbalanced dataset, 98% vs 2%, and there are over 2million rows in this dataset, I tried to use SMOTE method, but it is very slow, because it can be done very soon in python, so is there any mistake in my code? Here is my code:
task = TaskClassif$new("pcs",backend =pcs,target = "navigator",positive = "1" )
table(task$truth())
po_over = po("classbalancing",id="oversample",adjust="minor",reference="minor",shuffle=F,ratio=16)
table(po_over$train(list(task))$output$truth())
learner = mlr_learners$get("classif.rpart")
learner$predict_type = "prob"
learner = po_over %>>% learner
resampling = rsmp("holdout",ratio=0.8)
rr = resample(task,learner,resampling,store_models = T)
res <- rr$prediction()
auto1 <- autoplot(res)
auto2 <- autoplot(res,type='roc')
rr$score(msr("classif.acc"))$classif.acc %>% print()
and for the SMOTE:
gr_smote =
po("colapply", id = "int_to_num",
applicator = as.numeric, affect_columns = selector_type("integer")) %>>%
po("smote", dup_size = 15) %>>%
po("colapply", id = "num_to_int",
applicator = function(x) as.integer(round(x, 0L)), affect_columns = selector_type("numeric"))