0

I am trying to create a seq2seq network having as backbone a network architecture defined in this machinelearningmastery blog. Basically the original example uses input 6D and output 3D sequences while mine uses 32D and output 32D.

More precisely the original model uses random integer (ranging from 1 to 50 in value range) sequence of 6 integers. My model uses 32D values (ranging 0-255 in value).

As I was watching the model trainig I noticed that there two criteria for judging converging: loss and metrics (usually accuracy). Typical info would be something like:

99776/100000 [============================>.] - ETA: 0s - loss: 0.0223 - acc: 0.9967
99904/100000 [============================>.] - ETA: 0s - loss: 0.0223 - acc: 0.9967
100000/100000 [==============================] - 40s 400us/step - loss: 0.0223 - acc: 0.9967

Ok in the (simple) blog example I can see the loss going down while the acc going up to 1 simultaneously. Though in my case (which is a harder problem to solve) I can see the loss going down and the accuracy going up to value 1.0 quite rapidly comparing to the loss. Some observation I have made checking on these info message:

  • After some iterations I can have an accuracy of 1.0 and my loss would be still definitely above the zero threshold (e.g. 0.0222). I mean I don't get loss in the magnitude order of 1e-5 for example or something.
  • At the start of each epoch there seem to be a loss and definitely accuracy gap. I found a good explanation about this in here. In a summary, it says that during an epoch the loss and acc being displayed are the mean values over all batches, while at every new epoch this is initialized to the actual current value.
  • Even when the acc is fixed to 1.0 and I am performance on my training data I do not get such high results to justify this high accuracy.

So, my question is what is this accuracy showing anyway? I know that it's meant only for the programmer (me that is) and it's not used by the model itself (unlike loss which is used) but how is it calculated or what does it represent anyway? In my case I have sequences of 32D values ranging from 0-255 so each sequence is represented as an array of size (32, 257). So, accuracy 0.99 means 99 out of 100 sequences match? Is that so?

Eypros
  • 5,370
  • 6
  • 42
  • 75
  • Answer here may be useful (disclaimer: mine), at least for a high-level understanding: [Loss & accuracy - Are these reasonable learning curves?](https://stackoverflow.com/questions/47817424/loss-accuracy-are-these-reasonable-learning-curves/47819022#47819022) – desertnaut Feb 15 '19 at 11:59

1 Answers1

1

So, my question is what is this accuracy showing anyway?

As explained in this answer here, the actual accuracy metric that keras chooses depends on the loss that you have chosen. I would have guessed that in your case, it defaults to categorical_accuracy:

def categorical_accuracy(y_true, y_pred):
    return K.cast(K.equal(K.argmax(y_true, axis=-1),
                          K.argmax(y_pred, axis=-1)), K.floatx())

which would mean that it compares whether the maximum value in y_true and y_pred occur on the same position. That, of course, would not be very meaningful for you.

mrks
  • 513
  • 3
  • 7