0

I want to plot the train auc and cv auc w.r.t depth change in decision tree model but min_samples_split value changing as shown in the code . If i fix the value of min_samples_split = 5 or 10 . then the curve is plotted , but if i take 2 values for min_samples_split = [5 , 10] then i am getting the VALUE ERROR : x and y must have same first dimension, but have shapes (5,) and (10,) . I am understanding the error , but how to get it plotted .

train_auc = []
cv_auc = []


depth =  [1, 5, 10, 50, 100]
k = [5, 10]

for i in depth :
    for p in k :
        clf = DecisionTreeClassifier(criterion='gini', max_depth= i , 
        min_samples_split= p , class_weight = 'balanced' )
        clf.fit(X_train, y_train)


        y_train_pred = clf.predict(X_train)    
        y_cv_pred = clf.predict(X_cv)


        train_auc.append(roc_auc_score(y_train,y_train_pred))
        cv_auc.append(roc_auc_score(y_cv, y_cv_pred))



plt.plot(depth , train_auc, label='Train AUC')
plt.plot(depth , cv_auc,  label='CV AUC')

plt.scatter(depth , train_auc,  label='Train AUC points')
plt.scatter(depth , cv_auc ,  label='CV AUC points')


plt.legend()
plt.xlabel("depth")
plt.ylabel("AUC")
plt.title("ERROR PLOTS")
plt.grid()
plt.show()
Andrew
  • 137
  • 1
  • 8
  • Please include all `import` lines and data sample for **runnable**, reproducible code block. See [MCVE] and (if using `pandas`) [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait Jun 17 '19 at 14:35

1 Answers1

0

Your code is not runnable, but from the logic of the code, one solution seems to move the plotting lines inside the for loop and use an index to plot the corresponding prediction values from your main lists.

Try doing something like the following.

count = 0
for i in depth :
    for p in k :
        clf = DecisionTreeClassifier(criterion='gini', max_depth= i , 
        min_samples_split= p , class_weight = 'balanced' )
        clf.fit(X_train, y_train)

        y_train_pred = clf.predict(X_train)    
        y_cv_pred = clf.predict(X_cv)

        train_auc.append(roc_auc_score(y_train,y_train_pred))
        cv_auc.append(roc_auc_score(y_cv, y_cv_pred))
        plt.plot(depth , train_auc[count], label='Train AUC')
        plt.plot(depth , cv_auc[count],  label='CV AUC')
        plt.scatter(depth , train_auc[count],  label='Train AUC points')
        plt.scatter(depth , cv_auc[count],  label='CV AUC points')

plt.legend()
# rest of the code
Sheldore
  • 37,862
  • 7
  • 57
  • 71