1

For a Multilabel Classification problem i am trying to plot precission and recall curve.

The sample code is taken from "https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py" under section Create multi-label data, fit, and predict.

I am trying to fit it in my code but i get below error as "ValueError: Can only tuple-index with a MultiIndex" when i try below code.

train_df.columns.values

array(['DefId', 'DefectCount', 'SprintNo', 'ReqName', 'AreaChange',
   'CodeChange', 'TestSuite'], dtype=object)

Test Suite is the value to be predicted

X_train = train_df.drop("TestSuite", axis=1)
Y_train = train_df["TestSuite"]
X_test  = test_df.drop("DefId", axis=1).copy()

classes --> i have hardcorded with the testsuite values

from sklearn.preprocessing import label_binarize

# Use label_binarize to be multi-label like settings
Y = label_binarize(Y_train, classes=np.array([0, 1, 2,3,4])
n_classes = Y.shape[1]



# We use OneVsRestClassifier for multi-label prediction
from sklearn.multiclass import OneVsRestClassifier

# Run classifier
classifier = OneVsRestClassifier(svm.LinearSVC(random_state=3))
classifier.fit(X_train, Y_train)
y_score = classifier.decision_function(X_test)


from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
import pandas as pd

# For each class
precision = dict()
recall = dict()
average_precision = dict()
#n_classes = Y.shape[1]
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_train[:, i], y_score[:, i])
    average_precision[i] = average_precision_score(Y_train[:, i], y_score[:, i])

Input Data -> Values has been categorised

Priya C
  • 63
  • 6
  • Please provide some sample data of the DataFrame that are you trying to index when this error is thrown. – w-m Feb 20 '19 at 14:27
  • I have updated with screen shot of input feed. Pls have a look.Note: I have categorised the input feed as 0,1,2,3 – Priya C Feb 25 '19 at 06:17
  • Sorry, but the question is still not clear. Please provide data as text, not as image. But also, in the image, the data is called train_df, a string that doesn't appear in your code at all. Best rework your question into a small, reproducible example, then we can help you. Get some helpful hints how to do that here: https://stackoverflow.com/a/20159305/463796 – w-m Feb 25 '19 at 11:04
  • Added few details... hope this helps.. pls let me know if any other details needed incase – Priya C Feb 26 '19 at 12:01
  • Unfortunately it's still not a program I can run locally to get the same result as you do. Please provide reproducible code that somebody else can just copy and run. Find in the link above ideas how to make your data available. Try to condense your problem to the few lines that really matter in your question. – w-m Feb 26 '19 at 12:28
  • I have recified the issue. I just added below code. Y_train = label_binarize(Y_train, classes=[0,1,2,3,4]). The issue is because Y_train is like a array and not in 5 dimensional form – Priya C Feb 27 '19 at 09:48

0 Answers0