Let's say I want to solve a multi-label problem using neural networks and Keras.
The outputs are typically of the form y=[0, 1, 0, 1, 0, 0], and it's easily possible to train a network using binary cross entropy and sigmoids for the outputs (e.g. see code below).
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(6, activation='relu')#Add 1 hidden layer
#with 6 neurons, with relu activation
model.add(Dense(6, activation='sigmoid'))#Here we specify that we have 6 outputs
#and we want outputs to be in [0,1]
model.compile(optimizer='Adam', loss='binary_crossentropy')
model.fit(xtrain, ytrain, batch_size=128)
When I do the fit on the last line, what really happens implementation-wise?
Is the network updated multiple times? One time after computing the error of each of the 6 outputs, propagating it back to upgrade weights?
Does it compute the error for each of the outputs separately, and then make one overall update of the network?
Edit: updated question after Daniel Möller answer
model.fit(xtrain, ytrain, batch_size=1)
My question is probably clearer with batch_size of size 1.
At each iteration, we pick 1 example from the training set and feed-forward. Then, we compute the error made on each output. In this case, the question are the following:
For the weights that are not shared across outputs (the weights from the hidden layer to the outputs), are they updated based on the error made by the model computed as the sum of the error on ALL outputs, or just by one specific output?
Is the model weights updated based on the sum of the error once or is the model updated multiple times, based on individual errors made on all outputs?