9

I have next code:

from sklearn.model_selection import train_test_split
from scipy.misc import imresize

def _chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]


def _batch_generator(data, batch_size):
    indexes = range(len(data))
    index_chunks = _chunks(indexes, batch_size)
    for i, indexes in enumerate(index_chunks):
        print("\nLoaded batch {0}\n".format(i + 1))
        batch_X = []
        batch_y = []
        for index in indexes:
            record = data[index]
            image = _read_train_image(record["id"], record["index"])
            mask = _read_train_mask(record["id"], record["index"])
            mask_resized = imresize(mask, (1276, 1916)) >= 123
            mask_reshaped = mask_resized.reshape((1276, 1916, 1))
            batch_X.append(image)
            batch_y.append(mask_reshaped)
        np_batch_X = np.array(batch_X)
        np_batch_y = np.array(batch_y)
        yield np_batch_X, np_batch_y


def train(data, model, batch_size, epochs):
    train_data, test_data = train_test_split(data)
    samples_per_epoch = len(train_data)
    steps_per_epoch = samples_per_epoch // batch_size
    print("Train on {0} records ({1} batches)".format(samples_per_epoch, steps_per_epoch))
    train_generator = _batch_generator(train_data, batch_size)
    model.fit_generator(train_generator, 
                        steps_per_epoch=steps_per_epoch, 
                        nb_epoch=epochs, 
                        verbose=1)

train(train_indexes[:30], autoencoder,
    batch_size=2,
    epochs=1)

So seems like it must works next way:

  • get 30 (just example) indexes from dataset
  • spit it to 22 train records and 8 validate indexes (not used yet)
  • split train indexes to batches of 2 index in generator (so - 11 batches) and it's works - len(list(_batch_generator(train_indexes[:22], 2))) really returns 11
  • fit model:
    • on batches generated by train_generator (in mine case - 11 batches, each - 2 images)
    • with 11 batches in epoch (steps_per_epoch=steps_per_epoch)
    • and 1 epoch (nb_epochs=epochs, epochs=1)

But output has next view:

Train on 22 records (11 batches)
Epoch 1/1

Loaded batch 1

C:\Users\user\venv\machinelearning\lib\site-packages\ipykernel_launcher.py:39: UserWarning: The semantics of the Keras 2 argument `steps_per_epoch` is not the same as the Keras 1 argument `samples_per_epoch`. `steps_per_epoch` is the number of batches to draw from the generator at each epoch. Basically steps_per_epoch = samples_per_epoch/batch_size. Similarly `nb_val_samples`->`validation_steps` and `val_samples`->`steps` arguments have changed. Update your method calls accordingly.
C:\Users\user\venv\machinelearning\lib\site-packages\ipykernel_launcher.py:39: UserWarning: Update your `fit_generator` call to the Keras 2 API: `fit_generator(<generator..., steps_per_epoch=11, verbose=1, epochs=1)`

Loaded batch 2

1/11 [=>............................] - ETA: 11s - loss: 0.7471
Loaded batch 3


Loaded batch 4


Loaded batch 5


Loaded batch 6

2/11 [====>.........................] - ETA: 17s - loss: 0.7116
Loaded batch 7


Loaded batch 8


Loaded batch 9


Loaded batch 10

3/11 [=======>......................] - ETA: 18s - loss: 0.6931
Loaded batch 11

Exception in thread Thread-50:
Traceback (most recent call last):
File "C:\Anaconda3\Lib\threading.py", line 916, in _bootstrap_inner
    self.run()
File "C:\Anaconda3\Lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
File "C:\Users\user\venv\machinelearning\lib\site-packages\keras\utils\data_utils.py", line 560, in data_generator_task
    generator_output = next(self._generator)
StopIteration

4/11 [=========>....................] - ETA: 18s - loss: 0.6663
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-16-092ba6eb51d2> in <module>()
    1 train(train_indexes[:30], autoencoder,
    2       batch_size=2,
----> 3       epochs=1)

<ipython-input-15-f2fec4e53382> in train(data, model, batch_size, epochs)
    37                         steps_per_epoch=steps_per_epoch,
    38                         nb_epoch=epochs,
---> 39                         verbose=1)

C:\Users\user\venv\machinelearning\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
    85                 warnings.warn('Update your `' + object_name +
    86                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87             return func(*args, **kwargs)
    88         wrapper._original_function = func
    89         return wrapper

C:\Users\user\venv\machinelearning\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, initial_epoch)
1807                 batch_index = 0
1808                 while steps_done < steps_per_epoch:
-> 1809                     generator_output = next(output_generator)
1810 
1811                     if not hasattr(generator_output, '__len__'):

StopIteration: 

So as I can see - all batches are readed successfylly (see "Loaded batch")

But StopIteration is raised by keras during processing batch 3 of epoch 1.

  • If your invocation of model.fit_generator is raising StopIteration at the same sample count everytime then double check if the generator that you've passed to the fit_generator method is producing the same amount of samples as specified in "steps_per_epoch" parameter. That was the cause of this error on my side. – Kamil Stadryniak Jun 26 '19 at 21:27

3 Answers3

9

I also met this problem, and I find a method is that you can insert "while True" block in data generator func. But I cannot get source. You can refer to my code following:

while True:
     assert len(inputs) == len(targets)
     indices = np.arange(len(inputs))
     if shuffle:
        np.random.shuffle(indices)
     if batchsize > len(indices):
        sys.stderr.write('BatchSize out of index size')
     batchsize = len(indices)
     for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
         if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
         else:
            excerpt = slice(start_idx, start_idx + batchsize)
         yield inputs[excerpt], targets[excerpt]
cuckooo
  • 91
  • 1
  • 3
  • 4
    This answer really helped me. For those coming to this answer later, in Keras, generators in fit_generator need to be infinitely iterable. The idea being that the function that created the generator needs to be responsible for cycling through your data as many times as needed. (Perhaps you can edit and add it in) – Paritosh Singh Feb 20 '19 at 07:47
4

A note about this issue in case others come to this page chasing it. The StopIteration bug is a known issue in keras that can be fixed, some of the time, by making sure that you set your batch size to an integer multiple of your number of samples. If this does not fix the issue, one thing that I have found is that having funky file formats that can't be read by the data generator will also sometimes cause a stopIteration error. To fix this, I run a script on my training folder that converts all of the images to a standard file type (jpg or png) prior to training. It looks something like this.

import glob
from PIL import Image
import os
d=1
for sample in glob.glob(r'C:\Users\Jeremiah\Pictures\training\classLabel_unformatted\*'):
    im = Image.open(sample)
    im.save(r'C:\Users\Jeremiah\Pictures\training\classLabel_formatted\%s.png' %d)
    d=d+1

I've found that running this script or something like it drastically reduces my frequency of these sorts of errors, especially when my training data is coming off of somewhere like google image search.

1

I found problem source. Firstly - mine dataset fully readed before fit end, so it raises

Exception in thread Thread-50:
Traceback (most recent call last):
File "C:\Anaconda3\Lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "C:\Anaconda3\Lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\user\venv\machinelearning\lib\site-packages\keras\utils\data_utils.py", line 560, in data_generator_task
generator_output = next(self._generator)
StopIteration

Exception handlers set stop_event and reraise exception

But :

def get(self):
    """Creates a generator to extract data from the queue.

    Skip the data if it is `None`.

    # Returns
        A generator
    """
    while self.is_running():
        if not self.queue.empty():
            inputs = self.queue.get()
            if inputs is not None:
                yield inputs
        else:
            time.sleep(self.wait_time)

So when stop event setted - it's can load data from queue

So I limited max_queue_size to 1.

  • 5
    The source problem is coming from the generator. In Keras, generators in fit_generator need to be infinitely iterable. Is your problem solved with the time.sleep? I won't bet on it but I'm curious? – Thomas Grsp Sep 19 '17 at 21:40