1

I'm currently studying RNN, in particular LSTM and I was trying to figure out how to implement a bidirectional LSTM to fill in the missing word in a sentence. I have a doubt about the strucuture of the train set to be passed to the fit method of the model. If my list of sentences is composed by elements like this: "HI GUYS, <MISSING> ARE YOU?" and my target label is "HOW", how could the BI-LSTM understand that it has to predict the missing value and not the next element of the sentence? I saw here that the advantage of a bidirectional LSTM is the ability to look in both past and future tokens to get information about the context and better predict the target, but I still don't get how to implement this in practice. So my questions are:

  1. what is the structure of my train set?
  2. Does the BI-LSTM know what token to predict or do I have to specify it? And how?
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Daniele Caliari
  • 158
  • 1
  • 11

1 Answers1

1

how could the BI-LSTM understand that it has to predict the missing value and not the next element of the sentence?

If you train it to, then it should "understand" what you want it to do.

  1. Your train set would be the list of words in the sentence (in the correct order) without the missing word. You can also choose to replace the missing word with a missing-word token, that way the model as one less task to do. Honestly, there are a lot of ways to do this. The y will be the missing word.

  2. I don't really understand what you mean here but I believe I have already answered it in the first answer.

Nerveless_child
  • 1,366
  • 2
  • 15
  • 19
  • Thanks for the answer, but honesty I still don't understand how can the BI-LSTM know in which position make the prediction and if I use a **missing-word token** how do I tell the model this is the token to predict? In Keras documentation there's nothing about. – Daniele Caliari Feb 12 '21 at 12:43