The situation you describe is practically identical with one found in a documentation example, using the first 2 classes of the iris data and a LinearSVC classifier (the algorithm uses the squared hinge loss, which, like the hinge loss you use here, results in a classifier that produces only binary outcomes and not probabilistic ones). The resulting plot there is:

i.e. qualitatively similar to yours here.
Nevertheless, your question is a legitimate one and a nice catch indeed; how comes and we get a behavior similar to one produced by probabilistic classifiers, when our classifier does not indeed produce probabilistic predictions (and hence any notion of a threshold sounds irrelevant)?
To see why this is so, we need to do some digging into the scikit-learn source code, starting from the plot_precision_recall_curve
function used here and following the thread down into the rabbit hole...
Starting from the source code of plot_precision_recall_curve
, we find:
y_pred, pos_label = _get_response(
X, estimator, response_method, pos_label=pos_label)
So, for the purposes of plotting the PR curve, the predictions y_pred
are not produced directly by the predict
method of our classifier, but by the _get_response()
internal function of scikit-learn.
_get_response()
in turn includes the lines:
prediction_method = _check_classifier_response_method(
estimator, response_method)
y_pred = prediction_method(X)
which finally leads us to the _check_classifier_response_method()
internal function; you can check the full source code of it - what is of interest here are the following 3 lines after the else
statement:
predict_proba = getattr(estimator, 'predict_proba', None)
decision_function = getattr(estimator, 'decision_function', None)
prediction_method = predict_proba or decision_function
By now, you may have started getting the point: under the hood, plot_precision_recall_curve
checks if either a predict_proba()
or a decision_function()
method is available for the classifier used; and if a predict_proba()
is not available, like your case here of an SGDClassifier with hinge loss (or the documentation example of a LinearSVC classifier with squared hinge loss), it reverts to the decision_function()
method instead, in order to calculate the y_pred
which will be subsequently used for plotting the PR (and ROC) curve.
The above have arguably answered your programming question about how exactly scikit-learn produces the plot and the underlying calculations in such cases; further theoretical inquiries regarding if & why using the decision_function()
of a non-probabilistic classifier is indeed a correct and legitimate approach to get a PR (or ROC) curve are out of scope for SO, and they should be addressed to Cross Validated, if necessary.