I'm trying to use cross_validate
function and SMOTE
function together in classification problem and I want to know how do it correctly.
This is simple function I use to call cross_validation in machine learning classification algorithm:
def bayes(dataIn, dataOut, cv, statistic):
# trainning method
naive_bayes = GaussianNB()
# applying the method
outputBayes = cross_validate(estimator = naive_bayes,
X = dataIn, y = dataOut,
cv = cv, scoring = statistic)
return outputBayes
I acessed cross_validate documentation to search if I could determine trainning dataset and testing dataset before call cross_validate function and not send complete dataInput and dataOutput. Because I want to use SMOTE function, and to do it, I need to separate dataset before do cross validation. If I use SMOTE in across dataset, results will be skewed.
How can I solve it? I should do my cross validation function? I do not want to do, cause cross_validate function return is very good to be used and I do not see how to do exactly same return.
I saw other questions about it, but I did not find that specific question:
SMOTE oversampling and cross-validation
Function for cross validation and oversampling (SMOTE)
Does oversampling happen before or after cross-validation using imblearn pipelines?