2

I know others have posted similar questions already, but I couldn't find a solution that was appropriate here.

I've written a custom keras layer to average outputs from DistilBert based on a mask. That is, I have dim=[batch_size, n_tokens_out, 768] coming in, mask along n_tokens_out based on a mask that is dim=[batch_size, n_tokens_out]. The output should be dim=[batch_size, 768]. Here's the code for the layer:

class CustomPool(tf.keras.layers.Layer):
    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(CustomPool, self).__init__(**kwargs)
    
    def call(self, x, mask):
        masked = tf.cast(tf.boolean_mask(x, mask = mask, axis = 0), tf.float32)
        mn = tf.reduce_mean(masked, axis = 1, keepdims=True)
        return tf.reshape(mn, (tf.shape(x)[0], self.output_dim))
    
    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

The model compiles without error, but as soon as the training starts I get this error:

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Input to reshape is a tensor with 967 values, but the requested shape has 12288
     [[node pooled_distilBert/CustomPooling/Reshape (defined at <ipython-input-245-a498c2817fb9>:13) ]]
     [[assert_greater_equal/Assert/AssertGuard/pivot_f/_3/_233]]
  (1) Invalid argument:  Input to reshape is a tensor with 967 values, but the requested shape has 12288
     [[node pooled_distilBert/CustomPooling/Reshape (defined at <ipython-input-245-a498c2817fb9>:13) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_211523]

Errors may have originated from an input operation.
Input Source operations connected to node pooled_distilBert/CustomPooling/Reshape:
 pooled_distilBert/CustomPooling/Mean (defined at <ipython-input-245-a498c2817fb9>:11)

Input Source operations connected to node pooled_distilBert/CustomPooling/Reshape:
 pooled_distilBert/CustomPooling/Mean (defined at <ipython-input-245-a498c2817fb9>:11)

The dimensions I get back are smaller than the expected dimensions, which is strange to me.

Here is what the model looks like (TFDistilBertModel is from the huggingface transformers library):

dbert_layer = TFDistilBertModel.from_pretrained('distilbert-base-uncased')

in_id = tf.keras.layers.Input(shape=(seq_max_length,), dtype='int32', name="input_ids")
in_mask = tf.keras.layers.Input(shape=(seq_max_length,), dtype='int32', name="input_masks")
    
dbert_inputs = [in_id, in_mask]
dbert_output = dbert_layer(dbert_inputs)[0]
x = CustomPool(output_dim = dbert_output.shape[2], name='CustomPooling')(dbert_output, in_mask)
dense1 = tf.keras.layers.Dense(256, activation = 'relu', name='dense256')(x)
pred = tf.keras.layers.Dense(n_classes, activation='softmax', name='MODEL_OUT')(dense1)

model = tf.keras.models.Model(inputs = dbert_inputs, outputs = pred, name='pooled_distilBert')

Any help here would be greatly appreciated as I had a look through existing questions, most end up being solved by specifying an input shape (not applicable in my case).

bmt
  • 21
  • 2
  • Could you show us how you are using it? – Aditya Mishra Jul 31 '20 at 07:19
  • Added model code! @AdityaMishra – bmt Jul 31 '20 at 07:48
  • Could you check the shapes of "masked" & "mn" inside your call for the CustomPool layer. Your masked is of shape - [batch_size, 768] & hence mn - [batch_size, 1]. Now, converting a scalar into a vector of size 768 is not possible. Hence, the error. For a batch size of 3 and sequence length 32 I get the following shapes here in the [image](https://imgur.com/jancvfc.png) – Aditya Mishra Jul 31 '20 at 10:07
  • Hmm, I'm not sure what you used for mask there, but my mask is `[batch_size, n_tokens]`, and `x` is `[batch_size, n_tokens, 768]`. So when I mask I'm left with a `masked` size of `[batch_size, y, 768]` where `y` is between 1 and n_tokens (depending on the value of the mask). `reduce_mean` is supposed to collapse y --> 1. – bmt Jul 31 '20 at 15:29
  • Exactly those are the shapes that I get don't I? Since, I had provided `batch size=32` but used only 3 sentences for trial purpose & `sequence length=32`, the `mask => (3, 32)` and `x => (3, 32, 768)`. Are you sure your masked is of size `[batch_size, y, 768]` coz I'm getting `[batch_size, 768]`. If you have a colab notebook or something, you can share with me – Aditya Mishra Jul 31 '20 at 16:08

1 Answers1

0

Using tf.reshape before a pooling layer

I know that my answer kinda late, but I want to share my solution to the problem. The thing is when you try to reshape a fixed size of a vector (tensor) during model training. The vector will change its input size and a fixed reshape like tf.reshape(updated_inputs, (shape = fixed_shape)) will trigger your problems, actually my problem :)) Hope it helps

Minh Vu
  • 11
  • 2