I'm currently studying RNN, in particular LSTM and I was trying to figure out how to implement a bidirectional LSTM to fill in the missing word in a sentence. I have a doubt about the strucuture of the train set to be passed to the fit method of the model. If my list of sentences is composed by elements like this: "HI GUYS, <MISSING> ARE YOU?" and my target label is "HOW", how could the BI-LSTM understand that it has to predict the missing value and not the next element of the sentence? I saw here that the advantage of a bidirectional LSTM is the ability to look in both past and future tokens to get information about the context and better predict the target, but I still don't get how to implement this in practice. So my questions are:
- what is the structure of my train set?
- Does the BI-LSTM know what token to predict or do I have to specify it? And how?