Ok so because I didn't know the answer and was curious about it, I made some experimentations. I first created a sequence made of 3 time steps and 3 features :
inputs = np.ones([1, 3, 3]).astype(np.float32)
and I created a simple network where I print two intermediary layers :
inp = tf.keras.layers.Input(shape=(3,3))
mask=tf.keras.layers.Masking(mask_value=-np.inf)(inp)
out=tf.keras.layers.Dense(1,
kernel_initializer=tf.keras.initializers.Ones(),
use_bias=False)(mask)
model_mask=tf.keras.models.Model(inp,mask)
model=tf.keras.models.Model(inp,out)
print(model_mask(inputs))
print(model(inputs))
I used a Dense layer because it supports Masking and it allows to better understand what's happening, but the process is the same with RNNs. I also chose to set the mask value at -inf to see if the masked values are well masked. The weights of the Dense layer are set to one and I disabled biases, so this Dense layer calculate for each time step the sum of the inputs.
If i mask all the inputs of the time step :
inputs[0, 2, :] = -np.inf
this is what i have :
tf.Tensor(
[[[ 1. 1. 1.]
[ 1. 1. 1.]
[nan nan nan]]], shape=(1, 3, 3), dtype=float32)
tf.Tensor(
[[[ 3.]
[ 3.]
[nan]]], shape=(1, 3, 1), dtype=float32)
So the mask was correctly taken into account.
If I want to mask one value :
inputs[0, 2, 0] = -np.inf
and my outputs are :
tf.Tensor(
[[[ 1. 1. 1.]
[ 1. 1. 1.]
[-inf 1. 1.]]], shape=(1, 3, 3), dtype=float32)
tf.Tensor(
[[[ 3.]
[ 3.]
[-inf]]], shape=(1, 3, 1), dtype=float32)
So I conclude that masking was not processed.
You should create your own mask.
I tried with a little exemple, so I hope this example is exploitable for your project. First, I forget the Vanilla Masking layer from keras to use my own mask. The idea is to create a mask that put a 1 on masked values and 0 on real values. For instance, if your values are superior to 0, you replace your Nan values with -1 and you create your custom_mask
:
inputs = np.array([[[1,2,1],[0.5,2,1],[1,0,3]]],dtype=np.float32)
inputs[:,1,0]=-1
inputs[:,2,2]=-1
custom_mask=inputs.copy()
custom_mask[inputs[:,:,:]>=0]=0
custom_mask[inputs[:,:,:]<0]=1
with inputs
and custom_mask
respectively :
[[[ 1. 2. 1.]
[-1. 2. 1.]
[ 1. 0. -1.]]]
[[[0. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]]]
Then, you mutilply your mask by -1E9
in order to put infinite values where you want to mask your inputs. and you add it to your tensor. A simple ReLu
set masked values to 0 :
inp = tf.keras.layers.Input(shape=(3,3))
input_mask=tf.keras.activations.relu(inp-custom_mask*1E9)
out=tf.keras.layers.Dense(1,
kernel_initializer=tf.keras.initializers.Ones(),
use_bias=False)(input_mask)
model=tf.keras.models.Model(inp,out)
print(model(inputs))
equal to :
tf.Tensor(
[[[4.]
[3.]
[1.]]], shape=(1, 3, 1), dtype=float32)