15

Its is possible in keras to load only one batch in memory at a time as I have 40GB dataset of images.

If dataset is small I can used ImageDataGenerator to generator batches but due large dataset I can't load all images in memory.

Is there any method in keras to do something similar to following tensorflow code:

path_queue = tf.train.string_input_producer(input_paths, shuffle= False)
paths, contents = reader.read(path_queue)
inputs = decode(contents)
input_batch = tf.train.batch([inputs], batch_size=2)

I am using this method to serialize inputs in tensorflow but I don't know how to achieve this task in Keras.

Mohbat Tharani
  • 550
  • 1
  • 6
  • 22

1 Answers1

32

Keras has the method fit_generator() in its models. It accepts a python generator or a keras Sequence as input.

You can create a simple generator like this:

fileList = listOfFiles     

def imageLoader(files, batch_size):

    L = len(files)

    #this line is just to make the generator infinite, keras needs that    
    while True:

        batch_start = 0
        batch_end = batch_size

        while batch_start < L:
            limit = min(batch_end, L)
            X = someMethodToLoadImages(files[batch_start:limit])
            Y = someMethodToLoadTargets(files[batch_start:limit])

            yield (X,Y) #a tuple with two numpy arrays with batch_size samples     

            batch_start += batch_size   
            batch_end += batch_size

And fit like this:

model.fit_generator(imageLoader(fileList,batch_size),steps_per_epoch=..., epochs=..., ...)

Normally, you pass to steps_per_epoch the number of batches you will take from the generator.

You can also implement your own Keras Sequence. It's a little more work, but they recommend using this if you're going to make multi-thread processing.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214