14

I am new to Keras and just started working on some examples. I am dealing with the following problem: I have 4032 samples and use about 650 of them as for the fit or basically the training state and then use the rest for testing the model. The problem is that I keep getting the following error:

Exception: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size.

I understand why I am getting this error, my question is, what if the size of my data is not divisible by batch_size? I used to work with Deeplearning4j LSTM and did not have to deal with this problem. Is there anyway to get around with this?

Thanks

ahajib
  • 12,838
  • 29
  • 79
  • 120
  • As far as getting around it is concerned, change the batch size. If the number of samples is a prime number, drop 1 or 2 examples. Regarding, why this error occurs in Keras and not in Deeplearning4j, I am not sure. – Autonomous Jun 22 '16 at 17:07
  • Thanks for the suggestion but I was kind of hoping to get results without having to drop some samples. – ahajib Jun 22 '16 at 17:13
  • 4
    You don't have to drop samples. 650 is not a prime number. If your total number of samples is a prime number, then it won't matter what batch size you choose, it will not be divisible. In your case, you can choose batch size 5, 10, 65, etc. Is that a real issue for you? In my experience, changing batch size in reasonable limits won't affect performance too much. – Autonomous Jun 22 '16 at 17:17
  • Sometime the input size may be a prime number where in that case I have to choose a different batch size. – ahajib Jun 22 '16 at 17:24
  • 2
    Also, this is a requirement only in stateful networks in Keras. I worked with Keras extensively for implementing CNNs. I didn't have any such requirement then. – Autonomous Jun 22 '16 at 17:47
  • So does that mean if I switch to ```stateful=False``` then this would no longer be an issue? Btw, if stateful is False, is the model still an LSTM? I am using the network from one of the examples (stateful_lstm.py). Sorry if my questions are simple but I am a newbie :) Thanks – ahajib Jun 22 '16 at 18:11
  • 1
    No. Don't make any changes in the network architecture. In my opinion, you are overthinking about this issue. If you have 650 training samples, make batch size 50, 65 etc. On the other hand, drop one or two samples to make it divisible by batch size (example, 743 samples, its prime, so no batch size will help, so drop one sample, make it 742, that is divisible). Neural network performance won't be affected by one or ten samples more or less. If you have a dataset where removing 10 samples means removing 10% of the data, maybe you should think of some other method than neural networks. – Autonomous Jun 22 '16 at 18:17
  • The thing is that I am dealing with 50 datasets each have a different size and basically I must use certain amount of samples as for test (due to some benchmark restrictions). For now, I'll stick to the batch size 64 and try to make number of samples divisible by that. Also, any useful reference so I can read more about stateful networks? Once again, thank you so much. – ahajib Jun 22 '16 at 18:38
  • Curious: Did you stop using DL4J? If so, why? – racknuf Jun 22 '16 at 19:44
  • @tremstat No, I just wanted to compare the results of both – ahajib Jun 22 '16 at 19:59

1 Answers1

5

The simplest solution is to use fit_generator instead of fit. I write a simple dataloader class that can be inherited to do more complex stuff. It would look something like this with get_next_batch_data redefined to whatever your data is including stuff like augmentation etc..

class BatchedLoader():
    def __init__(self):
        self.possible_indices = [0,1,2,...N] #(say N = 33)
        self.cur_it = 0
        self.cur_epoch = 0

    def get_batch_indices(self):
        batch_indices = self.possible_indices [cur_it : cur_it + batchsize]
        # If len(batch_indices) < batchsize, the you've reached the end
        # In that case, reset cur_it to 0 and increase cur_epoch and shuffle possible_indices if wanted
        # And add remaining K = batchsize - len(batch_indices) to batch_indices


    def get_next_batch_data(self):
        # batch_indices = self.get_batch_indices()
        # The data points corresponding to those indices will be your next batch data
Prophecies
  • 723
  • 1
  • 7
  • 19
  • does it mean that you repeat the training with some of the samples from the beginning of the data? – Daisy Apr 28 '19 at 11:25
  • Yeah, either that or you can just have a smaller final batch. Vast majority of OPs are agnostic to first dimension (i.e., batch size) anyways – Prophecies Apr 30 '19 at 14:03