I am trying to create a seq2seq network having as backbone a network architecture defined in this machinelearningmastery blog. Basically the original example uses input 6D and output 3D sequences while mine uses 32D and output 32D.
More precisely the original model uses random integer (ranging from 1 to 50 in value range) sequence of 6 integers. My model uses 32D values (ranging 0-255 in value).
As I was watching the model trainig I noticed that there two criteria for judging converging: loss
and metrics (usually accuracy). Typical info would be something like:
99776/100000 [============================>.] - ETA: 0s - loss: 0.0223 - acc: 0.9967
99904/100000 [============================>.] - ETA: 0s - loss: 0.0223 - acc: 0.9967
100000/100000 [==============================] - 40s 400us/step - loss: 0.0223 - acc: 0.9967
Ok in the (simple) blog example I can see the loss
going down while the acc
going up to 1 simultaneously. Though in my case (which is a harder problem to solve) I can see the loss
going down and the accuracy going up to value 1.0
quite rapidly comparing to the loss
. Some observation I have made checking on these info message:
- After some iterations I can have an accuracy of
1.0
and myloss
would be still definitely above the zero threshold (e.g.0.0222
). I mean I don't getloss
in the magnitude order of1e-5
for example or something. - At the start of each epoch there seem to be a loss and definitely accuracy gap. I found a good explanation about this in here. In a summary, it says that during an epoch the
loss
andacc
being displayed are the mean values over all batches, while at every new epoch this is initialized to the actual current value. - Even when the
acc
is fixed to1.0
and I am performance on my training data I do not get such high results to justify this high accuracy.
So, my question is what is this accuracy showing anyway? I know that it's meant only for the programmer (me that is) and it's not used by the model itself (unlike loss
which is used) but how is it calculated or what does it represent anyway? In my case I have sequences of 32D values ranging from 0-255 so each sequence is represented as an array of size (32, 257)
. So, accuracy 0.99 means 99 out of 100 sequences match? Is that so?