I am working on a multi-label text classification problem(Total target labels 90). The data distribution has a long tail and around 1900k records. Currently, I am working on a small sample of around 100k records with similar target distribution.
Some algorithms provide functionality to handle class imbalance like PAC, LinearSVC. Currently, I am also doing SMOTE to generate samples for all except majority and RandomUnderSampler to suppress the imbalance from the majority class.
Is it right to use both the algorithm parameter & imblearn pipelines at the same time to handle class imbalance?
feat_pipeline = FeatureUnion([('text', text_pipeline)])
estimators_list = [
('PAC',PassiveAggressiveClassifier(max_iter=5000,random_state=0,class_weight='balanced')),
('linearSVC', LinearSVC(class_weight='balanced'))
]
estimators_ensemble = StackingClassifier(estimators=estimators_list,
final_estimator=LogisticRegression(solver='lbfgs',max_iter=5000))
ovr_ensemble = OneVsRestClassifier(estimators_ensemble)
classifier_pipeline = imblearnPipeline([
('features', feat_pipeline),
('over_sampling', SMOTE(sampling_strategy='auto')), # resample all classes but the majority class;
('under_sampling',RandomUnderSampler(sampling_strategy='auto')), # resample all classes but the minority class;
('ovr_ensemble', ovr_ensemble)
])