1

I'm implementing a LSTM model with Keras. I padded my sequences to a certain length to feed the dataset in the right way into the model.

At the moment, my model is the following:

model = tf.keras.Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(units=100, return_sequences=True, input_shape=(timesteps, features)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Does Keras automatically skip labels of masked values in the loss function?

today
  • 32,602
  • 8
  • 95
  • 115
pairon
  • 427
  • 1
  • 7
  • 18

1 Answers1

1

Yes, if your model utilizes masking then the objective function (i.e. loss function) would be automatically augmented to support masking and therefore ignoring masked samples/timesteps in calculation of loss. Actually, weighted_masked_objective is the function which does this under the hood:

def weighted_masked_objective(fn):
    """Adds support for masking and sample-weighting to an objective function.
    It transforms an objective function `fn(y_true, y_pred)`
    into a sample-weighted, cost-masked objective function
    `fn(y_true, y_pred, weights, mask)`.
    # Arguments
        fn: The objective function to wrap,
            with signature `fn(y_true, y_pred)`.
    # Returns
        A function with signature `fn(y_true, y_pred, weights, mask)`.
    """
    if fn is None:
        return None

    def weighted(y_true, y_pred, weights, mask=None):
        """Wrapper function.
        # Arguments
            y_true: `y_true` argument of `fn`.
            y_pred: `y_pred` argument of `fn`.
            weights: Weights tensor.
            mask: Mask tensor.
        # Returns
            Scalar tensor.
        """
        # score_array has ndim >= 2
        score_array = fn(y_true, y_pred)
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in Theano
            mask = K.cast(mask, K.floatx())
            # mask should have the same shape as score_array
            score_array *= mask
            #  the loss per batch should be proportional
            #  to the number of unmasked samples.
            score_array /= K.mean(mask) + K.epsilon()

        # apply sample weighting
        if weights is not None:
            # reduce score_array to same ndim as weight array
            ndim = K.ndim(score_array)
            weight_ndim = K.ndim(weights)
            score_array = K.mean(score_array,
                                 axis=list(range(weight_ndim, ndim)))
            score_array *= weights
            score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
        return K.mean(score_array)
    return weighted
today
  • 32,602
  • 8
  • 95
  • 115
  • Thank you. Is it possible that the masking has no effect if I used `scikit-learn`’s `StandardScaler` on my dataset (maybe vectors of zeros has been scaled to non-zero values)? – pairon Apr 10 '20 at 23:44
  • @pairon Indeed, they might no longer be all-zeros vectors after mean-std scaling; hence, the zero-masking would not be effective. To prevent that, you can first perform the scaling and then add the zero padding to the sequences. – today Apr 10 '20 at 23:49
  • Another question. When i use `predict`on test set, does the padded vector of test set ignored during predictions? – pairon Apr 14 '20 at 10:53
  • @pairon Masking should be active in both training and inference phases. So, yes they will be ignored in prediction as well. – today Apr 14 '20 at 10:58
  • Thank you. So, is it normal that in the predict phase I obtain values not equal to zero for the padding vector? Example of my output: `[9.3943197e-01 9.6559024e-01 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04 6.2561035e-04`Only the first two values are not referred to padded vector. – pairon Apr 14 '20 at 17:38
  • @pairon Yes, they are garbage/irrelevant values. Also see [this answer](https://stackoverflow.com/a/53470422/2099607) which investigate and answer more or less the same questions (especially the last part about computation of loss). – today Apr 14 '20 at 17:48