1

I am trying to make sure I'm using the correct terminology. The below diagram shows the MNIST example

X is 784 row vector
W is 784X10 matrix
b is a 10 row vector
The out of the linear box is fead into softmax
The output of softmax is fed into the distance function cross-entropy

enter image description here

How many layers are in this NN? What are the input and hidden layer in that example?

Similarly, how many layers are in this answer If my understanding is correct, then 3 layers?

Edit

@lejlot Does the below represent a 3 layered NN with 1 hidden layer?

enter image description here

Community
  • 1
  • 1
Sam Hammamy
  • 10,819
  • 10
  • 56
  • 94
  • You are confusing the notation. The input layer is the vector x where you place an input data. Then the operation -> *w -> +b -> f() -> is the conexion between the first layer and the second layer. The second layer is the vector where you store the result z=f(x*w1+b1) then softmax(z*w2+b2) is the conexion between the second and the third layer. The third layer is the vector y where you store the final result y=softmax(z*w2+b2). Cross entropy is not a layer is the cost function to train your neural network. – Rob Nov 13 '16 at 21:30

2 Answers2

2

Take a look at this picture:

http://cs231n.github.io/assets/nn1/neural_net.jpeg

In your first picture you have only two layers:

  • Input layers -> 784 neurons
  • Output layer -> 10 neurons

Your model is too simple (w contains directly connections between the input and the output and b contains the bias terms).

With no hidden layer you are obtaining a linear classifier, because a linear combination of linear combinations is a linear combination again. The hidden layers are what include non linear transformations in your model.

In your second picture you have 3 layers, but you are confused the notation:

  • The input layer is the vector x where you place an input data.
  • Then the operation -> w -> +b -> f() -> is the conexion between the first layer and the second layer.
  • The second layer is the vector where you store the result z=f(xw1+b1)
  • Then softmax(zw2+b2) is the conexion between the second and the third layer.
  • The third layer is the vector y where you store the final result y=softmax(zw2+b2).
  • Cross entropy is not a layer is the cost function to train your neural network.

EDIT:

One more thing, if you want to obtain a non linear classifier you must add a non linear transformation in every hidden layer, in the example that I have described, if f() is a non linear function (for example sigmoid, softsign, ...): z=f(xw1+b1)

If you add a non linear transformation only in the output layer (the softmax function that you have at the end) your outputs are still linear classifiers.

Rob
  • 1,080
  • 2
  • 10
  • 24
  • I think your original answer was correct. Non-linearity is unrelated to hidden'ness. Slap on a sigmoid after the bias terms has been added, and it's nonlinear. – Aske Doerge Nov 13 '16 at 19:07
  • Actually Rob is right, there are no hidden layers here, although the image is slightly tricky/non-standard. And what @Aske suggests (putting sigmoid "after bias") does not matter, model will still be linear. – lejlot Nov 13 '16 at 19:44
  • @lejlot I think I understand your point. Can you please comment on the edit I have made – Sam Hammamy Nov 13 '16 at 19:59
  • Yeah, the new one has 1 hidden layer (usually you would add some non-linearity in between, like sigmoid or relu, but even without it it is still a net with 1 hidden layer now). However naming convention is slightly different, "input" is your data, then "hidden" will be whatever is produced after first w1 x+b1, then "output" what is produced after w2 h + b2. Weights are usually called "weight between input and hidden layer" and "weights between hidden and output". Thus I would draw boxes in the other way around, but it is just a matter of naming convention. – lejlot Nov 13 '16 at 20:02
  • One more thing @SamHammamy, you are using a completelly different notation. You are calling a layer to the weights W and bias b and they the conexions between layers. Take a look at the picture that I have linked. You place a data x in the input layer. Then w and b are the conexions between layers, they cross from the first layer to the second and the operation z=x*w+b and you store the result z in the second layer. So the layers are the places where you store x and z and w and b are the conexions between them. – Rob Nov 13 '16 at 21:22
  • @lejlot I guess the graph part of the TensorFlow tutorials is seeping into my mental model. That's an *easier way for me to understand things at now. But both your points are taken. – Sam Hammamy Nov 13 '16 at 22:41
  • @rob ditto above ^^ – Sam Hammamy Nov 13 '16 at 22:41
0

That has 1 hidden layer.

The answer you link to, I would call a 2-hidden layer NN.

Your input-layer is the X-vector. Your layer Wx+b is the hidden layer, aka. the box in your picture. The output-layer is the Soft-max. The cross-entropy is your loss/cost function, and is not a layer at all.

Aske Doerge
  • 1,331
  • 10
  • 17
  • 1
    I think that he doesn't have a hidden layer. The NN has an input layer of 784 neurons and an output layer of 10 neurons(is a 10 class classification problem). With the matrix w he directly redirect the input to the output. – Rob Nov 13 '16 at 18:56
  • I see it as there is an input layer, and an output layer. A place where you put data in, and where you get a prediction out. Everything in between those are hidden layers. – Aske Doerge Nov 13 '16 at 19:12
  • 1
    there is no hidden layer. Softmax is barely a normalization factor, and usually it is even inside loss itself. "Layers" are usually parametrized transformations, and softmax is not parametrized (there is nothing to be learnt in softmax) thus it can be safely put inside the loss itself. – lejlot Nov 13 '16 at 19:43
  • If you put the softmax inside the loss you don't normalize predictions on none-training data. That would be pretty weird and unexpected. – Aske Doerge Nov 13 '16 at 22:44