LSTM setting/resetting the state when using a variable batch size

Question

I have built this LSTM class:

import tensorflow as tf
import Constants


class LSTM():

    def __init__(self,
                 inputShape,
                 outputShape,
                 numLayers=Constants.numLayers,
                 numHidden=Constants.numHidden,
                 learningRate=Constants.learningRate,
                 forgetBias=Constants.forgetBias):
        self.inputs = tf.placeholder(tf.float32, [None] + inputShape)
        self.labels = tf.placeholder(tf.float32, [None] + outputShape)
        self.inputTensors = tf.unstack(self.inputs, axis=1)
        self.weights = tf.Variable(tf.random_normal([numHidden] + outputShape))
        self.bias = tf.Variable(tf.random_normal(outputShape))
        layers = [tf.contrib.rnn.LSTMCell(numHidden, forget_bias=forgetBias, state_is_tuple=True)] * numLayers
        self.cell = tf.contrib.rnn.MultiRNNCell(layers, state_is_tuple=True)
        self.optimiser = tf.train.GradientDescentOptimizer(learningRate)
        self.forgetBias = forgetBias
        self.batchDict = None
        self.outputs = None
        self.finalStates = None
        self.predictions = None
        self.loss = None
        self.accuracy = None
        self.optimise = None
        self.session = tf.Session()
        self.__buildGraph()

    def __buildGraph(self):
        outputs, finalStates = tf.nn.static_rnn(self.cell, self.inputTensors, dtype=tf.float32)
        predictions = tf.add(tf.matmul(outputs[-1], self.weights), self.bias)
        self.predictions = tf.minimum(tf.maximum(predictions, 0), 1)
        self.loss = tf.losses.mean_squared_error(predictions=self.predictions, labels=self.labels)
        self.accuracy = tf.reduce_mean(1 - tf.abs(self.labels - self.predictions) / 1.0)
        self.optimise = self.optimiser.minimize(self.loss)
        self.session.run(tf.global_variables_initializer())

    def __execute(self, operation):
        return self.session.run(operation, self.batchDict)

    def setBatch(self, inputs, labels):
        self.batchDict = {self.inputs: inputs, self.labels: labels}

    def batchLabels(self):
        return self.__execute(self.labels)

    def batchPredictions(self):
        return self.__execute(self.predictions)

    def batchLoss(self):
        return self.__execute(self.loss)

    def batchAccuracy(self):
        return self.__execute(self.accuracy)

    def processBatch(self):
        self.__execute(self.optimise)

    def kill(self):
        self.session.close()

and I run it like so:

import DataWorker
import Constants
from Model import LSTM

inputShape = [Constants.sequenceLength, DataWorker.numFeatures]
outputShape = [1]

LSTM = LSTM(inputShape, outputShape)

# #############################################
# TRAINING
# #############################################
for epoch in range(Constants.numEpochs):
    print("***** EPOCH:", epoch + 1, "*****\n")
    IDPointer, TSPointer = 0, 0
    epochComplete = False
    batchNum = 0
    while not epochComplete:
        batchNum += 1
        batchX, batchY, IDPointer, TSPointer, epochComplete = DataWorker.generateBatch(IDPointer, TSPointer)
        LSTM.setBatch(batchX, batchY)
        LSTM.processBatch()
        if batchNum % Constants.printStep == 0 or epochComplete:
            print("Batch:\t\t", batchNum)
            print("Last Pred:\t", LSTM.batchPredictions()[-1][0])
            print("Last Label:\t", LSTM.batchLabels()[-1][0])
            print("Loss:\t\t", LSTM.batchLoss())
            print("Accuracy:\t", str("%.2f" % (LSTM.batchAccuracy() * 100) + "%\n"))

# #############################################
# TESTING
# #############################################
testX, testY = DataWorker.generateTestBatch()
LSTM.setBatchDict(testX, testY)
testAccuracy = LSTM.batchAccuracy()
print("Testing Accuracy:", str("%.2f" % (testAccuracy * 100) + "%"))

LSTM.kill()

This all works well as it should. However, I am using time series data which consists of financial stocks spanning over ranges of timestamps far greater than the number of time steps that my LSTM is unrolled for - Constants.sequenceLength. Because of this, it takes many sequential batches for a single stock t be processed, and so the state/memory of my LSTM needs to be passed between batches. As well as this, after a batch that completes the lifespan of an ID, the next batch would be passing in a new ID from the initial timestamp of my dataset, and so I would want to reset the memory.

There are many questions asking something similar, and all of the answers are adequate, however, none seem to address the issue of using variable batch sizes - batch sizes initialised to None and then inferred when a batch is passed in. My batches are usually a constant size, but do change under certain circumstances and I cannot change this. How can I have control over passing the state between batches, as well as resetting the state, if I have not specified the batch size?

What about assigning batch size as a placeholder? batch_size = tf.placeholder(tf.int32, [], name='batch_size') — ARAT, Jun 25 '18 at 06:50
@MustafaMuratARAT This is what I did in the end, and then in my `setBatchDict()` method I just set `batchSize: len(inputs)` — KOB, Jun 25 '18 at 06:56
perfect! I have the same issue when my training set has a few extra observations to be put in a batch. — ARAT, Jun 25 '18 at 07:01
UPDATE: actually that solution did not work for me, having multi-layered LSTM with different batch size using [this solution](https://stackoverflow.com/questions/37969065/tensorflow-best-way-to-save-state-in-rnns) . I get error `initial_value must have a shape specified: Tensor("MultiRNNCellZeroState/DropoutWrapperZeroState/LSTMCellZeroState/zeros:0", dtype=float32)` — ARAT, Jun 27 '18 at 01:35
@MustafaMuratARAT, when the model is first initialised, you must set the state as all zeros, and to do so, the batch size is needed. You can still use the batch size as a placeholder to be determined at runtime though. See one of my LSTMs here: https://github.com/KevOBrien/LSTM - specifically the `__buildTensorFlowGraph()` and `resetState()` methods. — KOB, Jun 27 '18 at 11:33
Also note that there is a memory leak when using that model as some of the operations create a new node on the graph each iteration, rather than replacing the previous one. Writing a TensorFlow model in an OOP style is very difficult! — KOB, Jun 27 '18 at 12:52
Oh I absolutely agree with you. I am pretty much not-experienced with Tensorflow and sometimes my model is running out of memory. Getting back to my problem, actually what you did is pretty similar to what I am doing. I am still getting the same error. It all makes sense though because the batch size should be equal because you are feeding a LSTM cell with a matrix whose dimension is [batch_size X num_neurons]. — ARAT, Jun 28 '18 at 15:09
Yes but the parameters of the network are duplicated over every example in the batch and updated according to the cost over the entire batch, so it is perfectly fine to change the batch size from batch to batch — KOB, Jun 28 '18 at 15:13
Could you please elaborate duplication of parameters in the network? I am sorry I ask too many questions. As far as I know weights are shared across time in an LSTM cell, is not it correct? so do not they have a fixed size? So that we need to have fix size of batches? Am I missing something here? — ARAT, Jun 28 '18 at 15:23
Yes but they are also shared across the entire batch, so whether you have a batch size of 1 in an iteration, and then a batch size of 100 in the next, it doesn't matter. The weights will be adjusted accordingly to the entire batch. — KOB, Jun 28 '18 at 15:31
@MustafaMuratARAT After reading your last comment again, I think you may be confused over the terms batch size and timesteps/number of unrollings through time. You should research the difference as they are two completely different concepts and have nothing to do with each other. Timesteps are only relevant to RNNs as inputs are in sequences through time, whereas the batch size is relevant to any type of NN - it just allows for Stochastic Gradient Descent where some number of examples in the data set are processed in parallel — KOB, Jul 01 '18 at 08:16
I am totally aware of the difference between batch size and time steps. I use LSTM and other types of NN for different purposes. However, dynamic batch size, where a batch has different number of samples, is something I want to learn about if possible. But what I meant was that it does not seem possible because of the dimensions of weight matrices for input and hidden state of previous layer. — ARAT, Jul 01 '18 at 15:21
If you change the batch size from 1 to 2, the entire LSTM is duplicated over the two inputs. All the weight matrices and every other parameter in the whole network is duplicated across every example in the batch. Say you have a batch size of 2 - think of it as passing the first input through the network, calculating the loss, then passing the second input through the network, calculating the loss, and then the backprop only happens after both inputs are processed, and the weights are adjusted according to the average of both losses. Thiis is all parallelized and not done sequentially. — KOB, Jul 01 '18 at 16:04

LSTM setting/resetting the state when using a variable batch size

0 Answers0