I am trying to tag letters in long char-sequences. The inherent structure of the data requires me to use a bidirectional approach.
Furthermore based on this idea I need access to the hidden state at each timestep, not just the final one.
To try the idea I used a fixed length approach. I currently use batches of random pieces of say 60 characters each out of my much longer sequences and run my handmade bidirectional classifier with zero_state
being the initial_state
for each 60-letters-piece.
This worked fine, but obviously not perfectly, as in reality the sequences are longer and the information left and right from the piece I randomly cut from the original source is lost.
Now in order to advance I want to work with the entire sequences. They heavily vary in length though and there is no way I'll fit the entire sequences (batched furthermore) onto the GPU.
I found the swap_memory - parameter in the dynamic_rnn documentation. Would that help?
I didn't find any further documentation that helped me understand. And I cannot really try this out myself easily because I need access to the hidden states at each timestep thus I coded the current graph without using any of the higher level wrappers (such as dynamic_rnn). Trying this out would require me to get all the intermediate states out of the wrapper which as I understand is a lot of work to implement.
Before going through the hassle of trying this out I would love to be sure that this would indeed solve my memory issue. Thx for any hints!