8

I have programmed a multilayer perception for binary classification. As I understand it, one hidden layer can be represented using just lines as decision boundaries (one line per hidden neuron). This works well and can easily be plotted just using the resulting weights after training.

However, as more layers are added I'm not sure about what approach to use and the visualization part is rarely handled in textbooks. I am wondering, is there a straight forward way of transforming the weight matrices from the different layers to this non-linear decision boundary (assuming 2D inputs)?

Many thanks,

johnblund
  • 402
  • 5
  • 21

1 Answers1

5

One of the approaches to plot decision boundaries (both for a linear or non-linear classifier) is to sample points in a uniform grid and feed them to the classifier. Asumming X is your data, you can create a uniform grid of points as follows:

h = .02  # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

Then, you feed those coordinates to your perceptron to capture their prediction:

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

Assuming clf is your Perceptron, the np.c_ creates features from the uniformly sampled points, feeds them to the classifier and captures in Z their prediction.

Finally, plot the decision boundaries as a contour plot (using matplotlib):

Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)

And optionally, plot also your data points:

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)

Fully working example, and credits for the example goes to scikit-learn (which btw, is a great machine learning library with a fully working Perceptron implemented).

Arjun Ashok
  • 53
  • 1
  • 8
Imanol Luengo
  • 15,366
  • 2
  • 49
  • 67
  • Thanks, nicely explained. – johnblund Oct 03 '15 at 10:47
  • Just another question, as it is the first NN I program. Is it convention that the patterns are row vectors in the input matrix (X) as in your example? – johnblund Oct 05 '15 at 08:46
  • @johnblund I'm not 100% sure if it is in a different way for NNs (as I don't work with them), but in general for machine learning the convention is to have the training data in an `X = [n_samples x n_features]` matrix and the ground truth in a matrix `y = [n_samples x 1]` or `y = [n_samples x n_classes]` (for a NN it would be `y = [n_samples x n_outputs]` with `n_outputs` the outputs of the last layer which usually corresponds with `n_classes`). – Imanol Luengo Oct 05 '15 at 09:37