My question is an extension to the one asked in the following post: Multivariate LSTM with missing values.
I have a multivariate time series (4 features over 14 time steps with 300 total participants and a binary outcome). The problem: Some participants have all 4 features (across all 14 time steps) but a majority subset of participants are missing one or more the features across all time steps. (All participants have >=1 feature across time steps.) In my dataset, 0 is a meaningful value (precluding me from replacing all missing values with 0, as suggested in the linked post above).
I am building an LSTM classification model. I have tried the following so far without success:
(1) I have tried finding the subset of participants with all 4 features (across all time steps). I don't prefer this option because I lose 70 % of participants.
(2) I have tried a keras masking layer https://keras.io/api/layers/core_layers/masking/.
model.add(tf.keras.layers.Masking(mask_value=99.,
input_shape=(timesteps, features)))
replacing all missing values with '99' in the dataset. Performance of the model dropped, and I notice in the documentation:
...if all [emphasis added] values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped)...
In my data a single time step would never have all features = mask_value since every participant had >=1 feature at every time step, so I'm assuming the masking this way will not work.
(3) I've tried replacing all missing values with -1 (not a meaningful value in my data) in an effort to help the model learn that this is a marker for missingness. Again, performance dropped. I am not sure this was a result of including the additional data, or whether the model failed to learn this marker.
The core of my question: What is the best way to handle missing features in raw time series, which includes 0 as a meaningful value?