Keras load sound instances in batches

Question

I need help trying to figure out how to achieve batch loading with Keras.

So far I'm trying to make a song classifier with Keras CNN. I've built the model below for 10 genre classification.

model = Sequential()

model.add(Conv1D(16, 5, padding="same", input_shape=(1, 661500)))
model.add(Activation("relu"))
model.add(MaxPool1D(pool_size=2, padding="same"))

model.add(Conv1D(16, 5, padding="same"))
model.add(Activation("relu"))
model.add(MaxPool1D(pool_size=2, padding="same"))

model.add(Conv1D(16, 5, padding="same"))
model.add(Activation("relu"))
model.add(MaxPool1D(pool_size=2, padding="same"))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation("relu"))
model.add(Dense(10))
model.add(Activation("softmax"))

model.compile(optimizer="adam",
              loss="categorical_crossentropy",
              metrics=["accuracy"])

It's working when I load the instances and labels myself but my computer can't handle 1000 songs at once. I tried using ImageDataGenerator to load them in batches with flow_from_directory. The code is as below:

generator = ImageDataGenerator()
train_generator = generator.flow_from_directory("train",
                                                target_size=(1, 661500),
                                                batch_size=64,
                                                class_mode="categorical")
test_generator = generator.flow_from_directory("test",
                                               target_size=(1, 661500),
                                               batch_size=64,
                                               class_mode="categorical")

model.fit_generator(train_generator,
                    steps_per_epoch=5584,
                    epochs=10,
                    validation_data=test_generator,
                    validation_steps=1861)

I ran into the problem of audio files not being images so I added .wav to whitelisted file formats in

\keras\Lib\site-packages\keras\preprocessing\image.py

This let Keras to find the audio images but it can't really open them. I changed where it is opening them using Pillow to Librosa but it gives more errors. I don't think I can change all of them, so I was wondering if there's a way to achieve batch loading?

Edit: I came by to this question which pointed me to Keras sequences I implemented one as seen below.

class MySequence(Sequence):

    def __init__(self, x_files, y_files, batch_size):
        self.x, self.y = x_files, y_files
        self.batch_size = batch_size

    def __len__(self):
        from numpy import ceil
        return int(ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        from librosa import load
        from numpy import array, expand_dims
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]

        return expand_dims(array([load(file_name)[0] for file_name in batch_x]), axis=1), array(batch_y)

This time the train times got ridiculously long. Previously, 10 epochs were completed in 3 hours but now it takes 14 hours for one epoch. Is there anything I can do to reduce train times?

Edit 2: Changed steps_per_epoch parameter in fit_generator function and it is down to acceptable levels.

score 1 · Accepted Answer · answered Mar 19 '18 at 10:41

1

It seems like IO is the problem here. Maybe serialize your data in somthing like pickle and see if the training gets faster.

Step_per_epoch simply reduce the step in your epoch. So your training does not get faster you just don't use all your data in each epoch.

answered Mar 19 '18 at 10:41

dennis-w

2,166
1
13
23

I assumed step_per_epoch should be sample_count / batch_size. When I train loading all of the data to RAM and using fit function, training progress is written as batch_size/sample_count [=========>....................]. Can you explain why it is different while using Sequence and normal training? How can I achieve normal training process using Sequences. – Emre İyican Mar 19 '18 at 14:50
1

I never used steps_per_epoch, but with using Sequence the model function don't know the batchsize. It just receive the data from the Sequence so it can't tell wheter the first dimension is the bathc_size or something different. So it counts number of calls/Sequence.__len__(not 100% sure). The number besides the bar is different from using the normal fit function but this shouldn't make any difference in training. – dennis-w Mar 19 '18 at 15:21
From what I could understand step is how many times the Sequence.__getitem__ is called so batch_size * step_per_epoch = sample_count, as if I leave it to default, keras sets it to that value. Thank you for your help. – Emre İyican Mar 21 '18 at 12:47

Keras load sound instances in batches

1 Answers1