2

I'm trying to use the Discriminationthreshold Visualizer for my fitted models; They're all binary classifiers (logistic regression, lightgbm, and xgbclassifier) however, based on the documentation I am having a hard time producing the plot on already fitted models. My code is the following

# test is a logistic regression model 
from yellowbrick.classifier import DiscriminationThreshold
visualizer = DiscriminationThreshold(test, is_fitted = True)
visualizer.show()

the output of this is the following: Image of Empty DiscriminationThreshold Visual

Can someone please help me understand how to use the discriminationthreshold properly on a fitted model. I tried with the others lgbm and xgb and got an empty plot as well.

jdsurya
  • 1,326
  • 8
  • 16
  • `DiscriminationThreshold` required argument should be an estimator, I think you're trying to pass a dataset. Check the [docs](https://www.scikit-yb.org/en/latest/api/classifier/threshold.html#yellowbrick.classifier.threshold.DiscriminationThreshold) – Miguel Trejo Dec 21 '21 at 02:58
  • hey! test is a model object :) i just called it test because I was trying to get the function to work – data_newbie14 Dec 21 '21 at 03:18
  • can you provide some sample data to reproduce? – Miguel Trejo Dec 21 '21 at 03:20

1 Answers1

2

The DiscriminationThreshold visualizer works as the evaluator of a model and requires evaluation data set. This means you need to fit the visualizer regardless whether your model is already fitted or not. You seem to have omitted this step because your model is already fitted.

Try something like this:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from yellowbrick.classifier import DiscriminationThreshold
from yellowbrick.datasets import load_spam

# Load a binary classification dataset and split
X, y = load_spam()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Instantiate and fit the LogisticRegression model
model = LogisticRegression(multi_class="auto", solver="liblinear")
model.fit(X_train, y_train)

visualizer = DiscriminationThreshold(model, is_fitted=True)
visualizer.fit(X_test, y_test)  # Fit the test data to the visualizer
visualizer.show()

enter image description here

jdsurya
  • 1,326
  • 8
  • 16
  • I didn't realize that it still required data, so will the threshold plot be based on the evaluation of the test set ? Will update if this works on my end – data_newbie14 Dec 21 '21 at 03:20
  • 1
    Yes actually. It splits the provided data set into multiple smaller sets and evaluates the given model. The bands you see here, show the variation of the performance over those smaller sets, and the solid lines are the averages. – jdsurya Dec 21 '21 at 11:32
  • This takes a very long time for on a dataset around 300k - is there anyway i can speed it up? also is_fitted is 'auto' which means it checks if model object is fitted no? – Maths12 Mar 24 '22 at 22:25