I am studying the transformer code here as an example.
The positional encoding function below uses numpy for all of its operations. Then it casts them back to TF tensor when returning the result. Does such a pattern impact the performance when running the code especially when I run it on an GPU? Is that recommended to use only TF operations when implementing a model?
def get_positional_encoding(self, max_len):
"""PE_(pos, 2i) = sin(pos/10000^(2i/d_model))
PE_(pos, 2i+1) = cos(pos/10000^(2i/d_model))
"""
pos = np.expand_dims(np.arange(0, max_len), axis=1)
div_term = np.array([[1 / np.power(10000, (2 * (i//2) / self.d_model)) for i in range(self.d_model)]])
pos = pos * div_term
pe = np.zeros((max_len, self.d_model))
pe[:, 0:self.d_model//2] = np.sin(pos[:, 0::2])
pe[:, self.d_model//2:] = np.cos(pos[:, 0::2])
pe = np.expand_dims(pe, 0)
print(pe.shape)
return tf.cast(pe, dtype=tf.float32)