I recently came across openAI 5. I was curious to see how their model is built and understand it. I read in wikipedia that it "contains a single layer with a 1024-unit LSTM". Then I found this pdf containing a scheme of the architecture.
My Questions
From all this I don't understand a few things:
What does it mean to have a 1024-unit LSTM layer? Does this mean we have 1024 time steps with a single LSTM cell, or does this mean we have 1024 cells. Could you show me some kind of graph visualizing this? I'm especially having a hard time visualizing 1024 cells in one layer. (I tried looking at several SO questions such as 1, 2, or the openAI 5 blog, but they didn't help much).
How can you do reinforcement learning on such model? I'm used to RL being used with Q-Tables and them being updated during training. Does this simply mean that their loss function is the reward?
How come such large model doesn't suffer from vanishing gradients or something? Haven't seen in the pdf any types of normalizations or so.
In the pdf you can see a blue rectangle, seems like it's a unit and there are
N
of those. What does this mean? And correct me please if I'm mistaken, the pink boxes are used to select the best move/item(?)
In general all of this can be summarized to "how does the openAI 5 model work?