I have a video of 8000 frames, and I'd like to train a Keras model on batches of 200 frames each. I have a frame generator that loops through the video frame-by-frame and accumulates the (3 x 480 x 640) frames into a numpy matrix X
of shape (200, 3, 480, 640)
-- (batch size, rgb, frame height, frame width) -- and yields X
and Y
every 200th frame:
import cv2
...
def _frameGenerator(videoPath, dataPath, batchSize):
"""
Yield X and Y data when the batch is filled.
"""
camera = cv2.VideoCapture(videoPath)
width = camera.get(3)
height = camera.get(4)
frameCount = int(camera.get(7)) # Number of frames in the video file.
truthData = _prepData(dataPath, frameCount)
X = np.zeros((batchSize, 3, height, width))
Y = np.zeros((batchSize, 1))
batch = 0
for frameIdx, truth in enumerate(truthData):
ret, frame = camera.read()
if ret is False: continue
batchIndex = frameIdx%batchSize
X[batchIndex] = frame
Y[batchIndex] = truth
if batchIndex == 0 and frameIdx != 0:
batch += 1
print "now yielding batch", batch
yield X, Y
Here's how run fit_generator()
:
batchSize = 200
print "Starting training..."
model.fit_generator(
_frameGenerator(videoPath, dataPath, batchSize),
samples_per_epoch=8000,
nb_epoch=10,
verbose=args.verbosity
)
My understanding is an epoch finishes when samples_per_epoch
samples have been seen by the model, and samples_per_epoch
= batch size * number of batches = 200 * 40. So after training for an epoch on frames 0-7999, the next epoch will start training again from frame 0. Is this correct?
With this setup I expect 40 batches (of 200 frames each) to be passed from the generator to fit_generator
, per epoch; this would be 8000 total frames per epoch -- i.e., samples_per_epoch=8000
. Then for subsequent epochs, fit_generator
would reinitialize the generator such that we begin training again from the start of the video. Yet this is not the case. After the first epoch is complete (after the model logs batches 0-24), the generator picks up where it left off. Shouldn't the new epoch start again from the beginning of the training dataset?
If there is something incorrect in my understanding of fit_generator
please explain. I've gone through the documentation, this example, and these related issues. I'm using Keras v1.0.7 with the TensorFlow backend. This issue is also posted in the Keras repo.