Embed Auxiliary Label Confidence Data in Scikit-learn Training

Asked Apr 27 '20 at 17:59

Active May 02 '20 at 11:18

Viewed 124 times

I have a single-label classification dataset that is supplemented with an auxiliary column of the confidence that we have in the source of the label. The confidence values are categorical (low, medium, high). They stem from the process that had been responsible for the task of labeling that sample. As an illustrative example, in a music classification dataset, we are more confident in the accuracy of labels of the songs that have been labeled by an actual musician.

Does Scikit-learn provide any explicit means of incorporating this auxiliary confidence information to learn a better model?
If not, what would be some sound alternatives to do that in Scikit-learn? Would it be a sound idea to assign a weight proportional to the confidence level and embed that in the loss function of the model in the form of coefficients [1, 2]? sklearn.linear_model.LogisticRegression, for example, accepts a sample_weight vector that looks relevant.

edited May 02 '20 at 11:18

asked Apr 27 '20 at 17:59

Reveille

4,359
3
23
46

Embed Auxiliary Label Confidence Data in Scikit-learn Training

0 Answers0