I have sequence data that tells me what color was observed for multiple subjects at different points in time. For example:
ID | Time | Color |
---|---|---|
A | 1 | Blue |
A | 2 | Red |
A | 5 | Red |
B | 3 | Blue |
B | 6 | Green |
C | 1 | Red |
C | 3 | Orange |
I want to obtain predictions for the most likely color for the next 3 time steps, as well as the probability of that color appearing. For example, for ID A, I'd like to know the next 3 items (time, color) in the sequence as well as its probability of the color appearing.
I understand that LSTMs are often used to predict this type of sequential data, and that I would feed in a 3d array like
input =[
[[1,1], [2,2], [5,2]], #blue at t=1, red at t=2, red at t=5 for ID A
[[0,0], [3,1], [6,3]], #nothing for first entry, blue at t=3, green at t=6 for ID B
[[0,0], [1,2], [3,4]]
]
after mapping the colors to numbers (Blue-> 1, Red->2, Green-> 3, Orange -> 4etc.). My understanding is that, by default, the LSTM just predicts the next item in each sequence, so for example
output = [
[[7, 2]], #next item is most likely red at t=7
[[9, 3]], # next item is most likely red at t=9
[[6, 2]]
]
Is it possible to modify the output of my LSTM so that instead of just predicting the next occurence time and color, I can get the next 3 times, colors AND probabilities of the color appearing? For example, an output like
output = [
[[7, 2, 0.93], [8,2, 0.79], [10,4, 0.67]],
[[9, 2, 0.88], [11,3, 0.70], [14,3, 0.43]],
...
]
I've tried looking in the Sequential
documentation for Keras, but I'm not sure if I've found anything.
Furthermore, I see that there's a TrainX and TrainY typically used for model.fit()
but I'm also not sure what my TrainY would be here?