I am performing a url classification (phishing - nonphishing) and I plotted the learning curves (training vs cross validation score) for my model (Gradient Boost).
My View
It seems that these two curves converge and the difference is not significant. Tt's normal for the training set to have a slightly higher accuracy). (Figure 1)
The Question
I have limited experience on machine learning, thus I am asking your opinion. Is the way I am approaching the problem right? Is this model fine or is it overfitting?
Note: The classes are balanced and the features are well chosen
Relevant code
from yellowbrick.model_selection import LearningCurve
def plot_learning_curves(Χ, y, model):
# Create the learning curve visualizer
cv = StratifiedKFold(n_splits=5)
sizes = np.linspace(0.1, 1.0, 8)
visualizer = LearningCurve(model, cv=cv, train_sizes=sizes, n_jobs=4)
visualizer.fit(Χ, y) # Fit the data to the visualizer
visualizer.poof()