Why can't sklearn MLPClassifier predict xor?

Question

In theory, an MLP with a single hidden layer with just 3 neurons is enough to predict xor correctly. It could sometimes fail to converge properly, but 4 neurons are a safe bet.

Here's an example

I've tried to reproduce this using sklearn.neural_network.MLPClassifier:

from sklearn import neural_network
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np


x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1

model = neural_network.MLPClassifier(
    hidden_layer_sizes=(3,), n_iter_no_change=100,
    learning_rate_init=0.01, max_iter=1000
).fit(x_train, y_train)

x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1

prediction = model.predict(x_test)
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')

I only get around 0.75 accuracy, while the tensorflow playground model is perfect, any idea what makes the difference?

Tried also using tensorflow:

model = tf.keras.Sequential(layers=[
    tf.keras.layers.Input(shape=(2,)),
    tf.keras.layers.Dense(4, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(loss=tf.keras.losses.binary_crossentropy)

x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = (tmp[:, 0] ^ tmp[:, 1])

model.fit(x=x_train, y=y_train)

x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = (tmp[:, 0] ^ tmp[:, 1])

prediction = model.predict(x_test) > 0.5
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')

With this model I get similar results to the scikit-learn model... So it's not just a scikit-learn issue - am I missing some important hyper-parameter?

Edit

Ok, changed the loss to mean squared error instead of cross-entropy, and now I get with the tensorflow example 0.92 accuracy. I guess that's the problem with the MLPClassifier?

1) you cannot change the loss at will - the loss is dictated by the problem itself; MSE is for regression problems and CE for classification ones. 2) With MSE (i.e. regression setting) [accuracy is meaningless](https://stackoverflow.com/questions/48775305/what-function-defines-accuracy-in-keras-when-the-loss-is-mean-squared-error-mse/48788577#48788577) (same for all classification metrics, i.e precision, recall etc). — desertnaut, Jul 23 '20 at 16:43
@desertnaut You're right that CE is associated with classification while MSE with regression, which is why I've used CE in the first place. But since I take a threshold over the result, I do have a classifier at the end anyway, and thus all these metrics still apply regardless of the loss. Apparently MSE does better here for some reason I can not understand, even though it is a classification problem. — feature_engineer, Jul 26 '20 at 04:48
This may or may not be a good idea; see last part of own answer [here](https://stackoverflow.com/questions/38015181/accuracy-score-valueerror-cant-handle-mix-of-binary-and-continuous-target/54458777#54458777) and the comment therein. — desertnaut, Jul 26 '20 at 11:29
@desertnaut right, since the data is symmetric, this is the special case which Andre Ng shows in the beginning of his argument, where linear regression does quite well. — feature_engineer, Jul 26 '20 at 14:30
@desertnaut Actually looking at the playground code: https://github.com/tensorflow/playground/tree/master/src (nn and playground are the relevant ones) - it seems they're using MSE for the classification loss, with tanh activation on the output :) — feature_engineer, Jul 26 '20 at 15:54

score 1 · Accepted Answer · answered Jul 23 '20 at 19:57

1

Increasing the learning rate and/or maximum iterations seems to make the sklearn version work. Probably different solvers need different values for these, and it's not clear to me what the tf playground is using.

answered Jul 23 '20 at 19:57

Ben Reiniger

10,517
3
16
29

Why can't sklearn MLPClassifier predict xor?

1 Answers1