I gather that Masking layers in Keras are commonly used for handling data inputs with varying timesteps. Based on the documentation, I understand that if all of the features for a given timestep equal the mask value, then that timestep will be skipped in downstream layers.
For my problem, I am instead interested in using masking for features, where the data input shape to the network is (batch_size, num_timesteps, num_features)
. Essentially, I want to be able to predict a timeseries one step into the future with num_features
features, but assuming that I won't always have all the features from the previous timestep to base my prediction on.
For example, one could predict RGB values one timestep into the future for a pixel in a video stream based on partial data from a previous timestep. At every timestep the output should be all RGB, but some timesteps you may get only RG, or only RB, or only BG, but you never know which partial data you'll have at each timestep to make your prediction. This is why I want to somehow be able to indicate a feature as masked during training to accommodate this kind of prediction.
It may be that Masking in Keras is not the correct mechanism to achieve this. What is the correct type of network layer that would give me this behavior?