Outlier-Detection in Scikit-learn( Isolation Forest) in a pipeline

Question

I have encountered the problem, as I can't use the Isolation Forest algorithm in the Sklearn pipeline. I am trying to predict the credit card default using the Kaggle Credit Card Fraud Detection dataset. I am trying to fix everything after data partitioning in order to avoid data leakage. (By using pipelines for every cross-validation as I get an almost 100% F1-score using Logistic Regression in K-fold cross-validation without using pipelines) Most of the machine learning algorithms can be used (Logistic Regression, Random Forest Classifier, etc) but not for some anomaly detection algorithms such as IsolationForest. I wondered how can I fit these anomaly detection algorithms inside the Pipelines. Thanks.

Some details for X and Y (Y- 0 as a normal transaction, 1 as fraudulent transaction)

pipe =Pipeline([
    ('sc', StandardScaler()),
    ('smote', SMOTE()),
    ('IF', IsolationForest())
])

print(cross_val_score(pipe, X,Y, scoring='f1_weighted' ,cv=5))

# Result: [3.01179163e-06 3.53204982e-06 6.55363495e-06 3.51940600e-06 4.52981524e-06]

Please provide code snippets and sample data if needed, so other users can better understand your problem and suggest solutions. — mac13k, Jun 30 '20 at 13:35
Already add some details. Feel free to ask if it is not clear yet. — Lee Zhao Jun, Jul 01 '20 at 03:30

score 0 · Answer 1 · answered Sep 14 '20 at 09:06

0

Without further information, I would guess that your Pipeline import is from sklearn.pipelines. Just replace it with:

from imblearn.pipeline import Pipeline

For further information this helped me.

answered Sep 14 '20 at 09:06

maltequast

31
2

Outlier-Detection in Scikit-learn( Isolation Forest) in a pipeline

1 Answers1