61

I am using Keras with a Tensorflow backend in Python. To be more precise tensorflow 1.2.1 and its build-in contrib.keras lib.

I want to use the fit_generator method of a Sequential model object, but I am confused with what I should pass as the method-parameters.

From reading the doc here I got the following information:

  • generator : a python training data batch generator; endlessly looping over its training data
  • validation_data: -in my case - a python validation data batch generator; the doc doesn't mention endless looping over its validation data
  • steps_per_epoch : number of training batches = uniqueTrainingData / batchSize
  • validation steps : ??? ; = uniqueValidationData / batch size ???
  • use_multiprocessing : boolean; don't pass non picklable arguments ???
  • workers : max number of used processes

As indicated above with ??? I don't really know what validation_steps means. I know the definition of the above linked doc (Number of steps to yield from validation generator at the end of every epoch) but that only confuses my in the given context. From the doc i know that the validation_data generator has to yield data, label tuples in the form (inputs, targets). In contrast to that the above statement indicates that there have to be multiple "steps to yield from validation generator at the end of every epoch" which in this context would mean, that multiple validation batches would be yielded after each training epoch.

Questions about validation_steps:

  • Does it really work that way? If so: Why? I thought that after each epoch one validation batch, which ideally wasn't used before, is used for validation to ensure that the training gets validated without risking to "train" the model to perform better on already used validation sets.
  • In context of the previous question: Why is the recommended amount of validation steps uniqueValidationData / batches and not uniqueValidationData / epochs? Isn't it better to have e.g. 100 validation batches for 100 epochs instead of x validation batches where x could be less or more than the specified number of epochs? Alternatively: If you have much less validation batches than number of epoches, is the model trained without validation for the rest of the epochs or do validation sets get reused / reshuffled+reused?
  • Is it important that the training and validation batches have the same batch size (shared divisor of the dividends trainingDataCount and validationDataCount)?

Additional question about use_multiprocessing:

  • Are numpy arrays picklable or do I have to convert them to multidimensional lists?
Rouzbeh
  • 2,141
  • 1
  • 11
  • 19
Philipp Lange
  • 851
  • 1
  • 6
  • 11

1 Answers1

97

The validation generator works exactly like the training generator. You define how many batches it will wield per epoch.

  • The training generator will yield steps_per_epoch batches.
  • When the epoch ends, the validation generator will yield validation_steps batches.

But validation data has absolutely no relation to training data. There is no need to separate validation batches according to training batches (I would even say that there is no point in doing that, unless you have a very specific intention). Also, the total number of samples in training data is not related to the total number of samples in test data.

The point of having many batches is just to spare your computer's memory, so you test smaller packs one at a time. Probably, you find a batch size that will fit your memory or expected training time and use that size.

That said, Keras gives you a totally free method, so you can determine the training and the validation batches as you wish.

Epochs:

Ideally, you use all your validation data at once. If you use only part of your validation data, you will get different metrics for each batch, what may make you think that your model got worse or better when it actually didn't, you just measured different validation sets.

That's why they suggest validation_steps = total_validation_samples // validation_batch_size.
Theoretically, you test your entire data every epoch, as you theoretically should also train your entire data every epoch.

So, theorethycally each epoch yields:

  • steps_per_epoch = TotalTrainingSamples / TrainingBatchSize
  • validation_steps = TotalvalidationSamples / ValidationBatchSize

Basically, the two vars are: how many batches per epoch you will yield.
This makes sure that at each epoch:

  • You train exactly your entire training set
  • You validate exactly your entire validation set

Nevertheless, it's totally up to you how you separate your training and validation data.

If you do want to have one different batch per epoch (epochs using less than your entire data), it's ok, just pass steps_per_epoch=1 or validation_steps=1, for instance. The generator is not resetted after each epoch, so the second epoch will take the second batch, and so on, until it loops again to the first batch.

I prefer training the entire data per epoch, and if the time is too long, I use a callback that shows the logs at the end of each batch:

from keras.callbacks import LambdaCallback

callbacks = callbacks=[LambdaCallback(on_batch_end=lambda batch,logs:print(logs))]

Multiprocessing

I was never able to use use_multiprocessing=True, it freezes at the start of the first epoch.

I've noticed the workers are related to how many batches are preloaded from the generator. If you define max_queue_size=1, you will have exactly workers amount of batches preloaded.

They suggest you use keras Sequences when multiprocessing. The sequences work pretty much as a generator, but it keeps track of the order/position of each batch.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • thanks a lot. I'm aware that training and validation data are not directly related. i simply got confused by what the doc parameter descriptions really meant for me. also thanks for the clarification concerning the optimal use of validation batches and multiprocessing. – Philipp Lange Aug 29 '17 at 17:57
  • 1
    I did some correction in the `step` vars above, they're divided by the batch size instead of the number of batches. All the idea is unchanged, just the formula was wrong. – Daniel Möller Aug 29 '17 at 21:29
  • @DanielMöller Still I am confused with your answer. Lets say I set my `steps_per_epochs = 25 & epoch= 100 & validation_step = 3`. For every epoch, there were 25 steps and for each step, generator yielded training data of shape `X_train : (233, 100, 4) & Y_train : (233, 100, 2)` and training happens. The above process continues for every 25 steps and at the end of 25th step validation starts where the generator yield `X_validate: (33,100,4) & Y_validate : (33, 100, 2)` `3 times` and `validation acc & loss` printed in result. – Mari Jul 23 '19 at 18:30
  • @DanielMöller My question is : 1. What will be `batch_size in my case (for both training & Validation)` ? 2. During validation, the generator yields `3 times X_validate & Y_validate arrays`,since i have given `validation_steps = 3`. So how does loss and val_acc calculated ? Whether it will be calculated for every step & finally average the results ? or some other method ? – Mari Jul 23 '19 at 18:44
  • Batch size = 233 and 33 respectively. I'm not sure how Keras calculates the loss. Probably average of each batch. – Daniel Möller Jul 23 '19 at 18:48
  • @DanielMöller How come the batch_size = 233 & 33. As per the formula `steps_per_epoch = TotalTrainingSamples / TrainingBatchSize` you mentioned above, `TrainingBatchSize = TotalTrainingSamples /steps_per_epoch`, so in that case my `TrainingBatchSize = 233/25 = 9.32` ? Is my understanding is right ? – Mari Jul 23 '19 at 18:58
  • There is no "rule", it's all free. The batch size is the size your generator is yielding. You are yielding `233` and `33`. Those are your batch sizes. If you want to have one epoch cycling exactly your number of total samples, then you try to make your generator yield `batch_size = total_samples // steps_per_epoch`, and consequently you will have `number_of_batches = steps_per_epoch = total_samples // batch_size`. – Daniel Möller Jul 23 '19 at 19:06
  • @DanielMöller My Original Training is of shape `X_train : (233, 100, 4) & Y_train : (233, 100, 2)`and the generator yield a tuple which is also the same shape like above... For better understanding I have attached my model picture in link below for your reference, please look and give your comments – Mari Jul 23 '19 at 19:54
  • @DanielMöller - Picture 1 :data shape (Training, Validation & Testing) https://imgur.com/uMCLqWO Picture 2: LSTM Model training, randomly select the length and training. Objective : To predict the class earlier so I follow this structure https://imgur.com/IYLGfEo Picture 3 : Model Summary https://imgur.com/R1pTL9F Picture 4: Every 25 steps generator yields a data and training continues I have attached image for 1 epoch https://imgur.com/L6xt37i Picture 5 : At the end of 25th step in 1st epoch validation starts and loss & val_acc calculated https://imgur.com/k4rp67G – Mari Jul 23 '19 at 19:58
  • Why are you using a generator if you are yielding the entire data at once? – Daniel Möller Jul 23 '19 at 20:00
  • @DanielMöller Objective of my project : I would like to predict the class if I give only partial input into the model. I have implemented my model based on the answers what I got from here https://datascience.stackexchange.com/questions/26366/training-an-rnn-with-examples-of-different-lengths-in-keras (answered by @Kbrose) In such a way, i should train my training data with variable length sequence which corresponds to particular class. So to achieve, I used Generator. – Mari Jul 23 '19 at 20:05
  • I think this may be simpler than you think. You created a generator. You know what it outputs. So, you know how many batches it should yield until it completes a cycle. Just use `steps_per_epoch` as that number. If your generator is a `keras.utils.Sequence`, you don't even need a `steps_per_epoch`. – Daniel Möller Jul 24 '19 at 09:51
  • @DanielMöller I follow the same method what `krbrose` used in datascience.stackexchange.com/questions/26366/… so could you just tell me what will be `validation_step` based on his code. – Mari Jul 24 '19 at 13:18