After reading a few tutorials, this is the first time I have built a Keras Deep Learning Model as I am a beginner in machine learning and deep learning. Most of the tutorials use the train-test split to train and test the model. However, I chose to use StratifiedKFold CV. The code is as below.
X = dataset[:,0:80].astype(float)
Y = dataset[:,80]
kfold = StratifiedKFold(n_splits=10,random_state=seed)
for train, test in kfold.split(X, Y):
# create model
model = Sequential()
model.add(Dense())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam',metrics=['accuracy'])
model.fit(X[train], Y[train], epochs=100,batch_size=128, verbose=0)
scores = model.evaluate(X[test], Y[test], verbose=1)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))
Y[pred]= model.predict(X[test])
acc = accuracy_score(Y[test],Y[pred])
confusion = confusion_matrix(Y[test], Y[pred])
print(confusion)
plot_confusion_matrix(confusion, classes =['No','Yes'],title='Confusion Matrix')
TP= confusion[1,1]
TN= confusion[0,0]
FP= confusion[0,1]
FN= confusion[1,0]
print('Accuracy: ')
print((TP + TN) / float(TP + TN + FP + FN))
print(accuracy_score(Y[test],Y[pred]))
fpr, tpr, thresholds = roc_curve(Y[test], y_pred_prob)
plt.plot(fpr, tpr)
print(roc_auc_score(y_test, y_pred_prob))
y_pred_class = binarize([y_pred_prob], 0.3)[0]
confusion_new = confusion_matrix(Y[test], y_pred_class)
print(confusion_new)
I have understood the theoretical concept of Kfold CV and StratifiedKFoldCV. I have come across What does KFold in python exactly do?, KFolds Cross Validation vs train_test_split, and a few more links. But when I calculate the performance metrics it gives me the following errors.
NameError: name 'pred' is not defined
NameError: name 'y_pred_prob' is not defined
NameError: name 'roc_curve' is not defined
What I am doing wrong here? Why am I getting these errors? How do I fix this?
Thanks.