I'm trying to teach myself machine learning and I have a similar question to this.
Is this correct:
For example, if I have an input matrix, where X1, X2 and X3 are three numerical features (e.g. say they are petal length, stem length, flower length, and I'm trying to label whether the sample is a particular flower species or not):
x1 x2 x3 label
5 1 2 yes
3 9 8 no
1 2 3 yes
9 9 9 no
That you take the vector of the first ROW (not column) of the table above to be inputted into the network like this:
i.e. there would be three neurons (1 for each value of the first table row), and then w1,w2 and w3 are randomly selected, then to calculate the first neuron in the next column, you do the multiplication I have described, and you add a randomly selected bias term. This gives the value of that node.
This is done for a set of nodes (i.e. each column actually will have four nodes (three + a bias), for simplicity, i removed the other three nodes from the second column), and then in the last node before the output, there is an activation function to transform the sum into a value (e.g. 0-1 for sigmoid) and that value tells you whether the classification is yes or no.
I'm sorry for how basic this is, I want to really understand the process, and I'm doing it from free resources. So therefore generally, you should select the number of nodes in your network to be a multiple of the number of features, e.g. in this case, it would make sense to write:
from keras.models import Sequential
from keras.models import Dense
model = Sequential()
model.add(Dense(6,input_dim=3,activation='relu'))
model.add(Dense(6,input_dim=3,activation='relu'))
model.add(Dense(3,activation='softmax'))
What I don't understand is why the keras model has an activation function in each layer of the network and not just at the end, which is why I'm wondering if my understanding is correct/why I added the picture.
Edit 1: Just a note I saw that in the bias neuron, I put on the edge 'b=1', that might be confusing, I know the bias doesn't have a weight, so that was just a reminder to myself that the weight of the bias node is 1.