I am programming a relatively small LSTM model in Google Collab.
For reference I am using TensorFlow 1.13 to build the model, using tensorflow.keras for the keras API.
seq_len = 20000; n_classes = 4
inputs = ll.Input(shape=(seq_len,))
x = ll.Embedding(len(word_index), 1000)(inputs)
x = ll.LSTM(units=100, activation='relu', return_sequences=True)(x)
outputs = ll.Dense(units = n_classes, activation='softmax')(x)
model = Model(inputs, outputs)
model.summary()
I have checked that I have 15 GB of GPU RAM available, and according to my estimations the model with a batch size of 32 should fit in 3GB of RAM.
However, whenever I launch the training the server runs out of memory.
To be fair, I am using extremely long sequences of data (20000 is the maximum sequence length) but I would expect the model to unroll symbolically in memory and just fit in.
Reducing the batch size to 1 does not help either.
What is going on? How can I make this model fit in memory?
EDIT: I tried reducing the sequence length to 2 and that indeed makes it fit in memory. But I need the sequence length to remain high. How can I tell Tensorflow to not unroll the network at any point? (I suspect that is what is going on behind the scenes, how can I check if this is indeed the case?)
EDIT: If I remove the Softmax layer then the memory use drops to the normal range again. I think that the Softmax layer is causing Tensorflow to unroll the network. TimeDistributing the Softmax does not help though.