0

I have a simple example for which I am attempting to perform a classification using the MLPClassifier.

from sklearn.neural_network import MLPClassifier

# What are the features in our X data?
#  0. do .X files exist?
#  1. do .Y files exist?
#  2. does a Z.Z file exist?
# values are 0 for false and 1 for true

training_x = (
    [0,1,0],  # pure .Y files, no Z.Z
    [1,0,1],  # .X files and Z.Z
    [1,0,0],  # .X + w/o Z.Z
)
training_y = ('.Y, no .X, no Z.Z', '.X + Z.Z', '.X w/o Z.Z')
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(len(training_x)+1, len(training_x)+1), random_state=1)
# training
clf.fit(training_x, training_y)
# predictions
for i in (0,1):
    for j in (0,1):
        for k in (0,1):
            results = list(clf.predict_proba([[i, j, k]])[0])
            # seems they are reversed:
            results.reverse()
            discrete_results = None
            for index in range(len(training_x)):
                if results[index] > 0.999:
                    if discrete_results is not None:
                        print('hold on a minute')
                    discrete_results = training_y[index]
            print(f'{i},{j},{k} ==> {results}, discrete={discrete_results}')

As I test it with all possible (discrete) inputs, I would expect for the predictions for the input cases: [0,1,0], [1,0,1] and [1,0,0] that I would see a close match to my three training_y cases and for other input cases the results would be under-defined and not of interest. However, those three input cases are not matched at all, unless I reverse the proba results where the [0,1,0] input does match and the other two are swapped. Here is the output with the reverse included:

0,0,0 ==> [1.1527971240749179e-19, 0.0029561479916546647, 0.9970438520083453], discrete=None
0,0,1 ==> [0.9999549772644907, 3.686866933257315e-08, 4.498586684013346e-05], discrete=.Y, no .X, no Z.Z
0,1,0 ==> [0.9999549772644907, 3.686866933257315e-08, 4.498586684013346e-05], discrete=.Y, no .X, no Z.Z
0,1,1 ==> [0.9999549772644907, 3.686866933257315e-08, 4.498586684013346e-05], discrete=.Y, no .X, no Z.Z
1,0,0 ==> [4.971668615064256e-68, 0.9999999980156198, 1.9843802638506693e-09], discrete=.X + Z.Z
1,0,1 ==> [1.3622448606166547e-05, 3.911037287197552e-05, 0.9999472671785217], discrete=.X w/o Z.Z
1,1,0 ==> [3.09415772026147e-33, 0.934313523906787, 0.06568647609321301], discrete=None
1,1,1 ==> [0.9999549772644907, 3.686866933257315e-08, 4.498586684013346e-05], discrete=.Y, no .X, no Z.Z

I have, no doubt, made a silly beginner's error! Help with finding it would be appreciated.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Kevin Buchs
  • 2,520
  • 4
  • 36
  • 55
  • You should not expect to get anything meaningful with only 3 training samples; NNs do not work with such symbolic reasoning - they need *quantities* of data! The *spirit* of my answer here might be helpful: [Neural network for square (x^2) approximation](https://stackoverflow.com/questions/55170460/neural-network-for-square-x2-approximation) – desertnaut May 06 '22 at 14:40
  • Thanks, @desertnaut . I appreciate the feedback. It makes sense to me. I increased my training by using the same data but replicating it 10,000 times. Unfortunately, it generates the same incorrect predictions in 2 of the 3 cases (or 3 of 3 if I don't reverse the results). I must still be missing something. – Kevin Buchs May 06 '22 at 15:48

1 Answers1

1

The order of probabilities from predict_proba are not "reversed", they are stored in (presumably) alphabetical order; you can check the order in the attribute classes_. And instead of discretizing yourself at the threshold 0.999, consider calling predict, which will take the class with largest probability, but more importantly translate back to the text of the class internally.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • Thanks, @Ben. I now see the order of the classes is arbitrary. That doesn't make sense to implement it that way, but at least I know how to work with it now. I was looking at the probabilities hoping to detect inputs for which the model was not trained. Since there is discrete matching, I did not want guesses, but certainty. However, with the 30k samples now, I get 8-9s of probability for predictions that were not trained. I can see my next step must be to train for all input cases, with undefined inputs returning an undefined output. Thanks for your help! – Kevin Buchs May 10 '22 at 13:47