11

I'm working in a project that isolate vocal parts from an audio. I'm using the DSD100 dataset, but for doing tests I'm using the DSD100subset dataset from I only use the mixtures and the vocals. I'm basing this work on this article

First I process the audios to extract a spectrogram and put it on a list, with all the audios forming four lists (trainMixed, trainVocals, testMixed, testVocals). Like this:

def to_spec(wav, n_fft=1024, hop_length=256):
    return librosa.stft(wav, n_fft=n_fft, hop_length=hop_length)

def prepareData(filename, sr=22050, hop_length=256, n_fft=1024):
  audio_wav = librosa.load(filename, sr=sr, mono=True, duration=30)[0]
  audio_spec=to_spec(audio_wav, n_fft=n_fft, hop_length=hop_length)
  audio_spec_mag = np.abs(audio_spec)
  maxVal = np.max(audio_spec_mag)

  return audio_spec_mag, maxVal


# FOR EVERY LIST (trainMixed, trainVocals, testMixed, testVocals)
trainMixed = []
trainMixedNum = 0
for (root, dirs, files) in walk('./Dev-subset-mix/Dev/'):
  for d in dirs:
    filenameMix = './Dev-subset-mix/Dev/'+d+'/mixture.wav'
    spec_mag, maxVal = prepareData(filenameMix, n_fft=1024, hop_length=256)
    trainMixed.append(spec_mag/maxVal)

Next i build the model:

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import SGD
from keras.layers.advanced_activations import LeakyReLU

model = Sequential()
model.add(Conv2D(16, (3,3), padding='same', input_shape=(513, 25, 1)))
model.add(LeakyReLU())
model.add(Conv2D(16, (3,3), padding='same'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Dropout(0.25))
model.add(Conv2D(16, (3,3), padding='same'))
model.add(LeakyReLU())
model.add(Conv2D(16, (3,3), padding='same'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64))
model.add(LeakyReLU())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss=keras.losses.binary_crossentropy, optimizer=sgd, metrics=['accuracy'])

And run the model:

model.fit(trainMixed, trainVocals,epochs=10, validation_data=(testMixed, testVocals))

But I'm getting this result:

ValueError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:806 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:796 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:789 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:747 train_step
        y_pred = self(x, training=True)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:976 __call__
        self.name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/input_spec.py:158 assert_input_compatibility
        ' input tensors. Inputs received: ' + str(inputs))

    ValueError: Layer sequential_1 expects 1 inputs, but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 2584) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 2584) dtype=float32>]

I am new to this topic, thanks for the help provided in advance.

Jorge Ramón
  • 141
  • 1
  • 1
  • 8

2 Answers2

12

It's probably an issue with specifying input data to Keras' fit() function. I would recommend using a tf.data.Dataset as input to fit() like so:

import tensorflow as tf

train_data = tf.data.Dataset.from_tensor_slices((trainMixed, trainVocals))
valid_data = tf.data.Dataset.from_tensor_slices((testMixed, testVocals))

model.fit(train_data, epochs=10, validation_data=valid_data)

You can then also use functions like shuffle() and batch() on the TF datasets.

EDIT: It also seems like your input shapes are incorrect. The input_shape you specified for the first conv layer is (513, 25, 1), so the input should be a batch tensor of shape (batch_size, 513, 25, 1), whereas you're inputting the shape (batch_size, 2584). So you'll need to reshape and probably cut your inputs to the specified shape, or specify a new shape.

Aaron Keesing
  • 1,277
  • 10
  • 18
  • 1
    Hi, thanks for the help. I tried your code but the error change to: `ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: [513, 2584]` – Jorge Ramón Sep 06 '20 at 17:39
  • I've updated my answer. The problem is that the shapes are incompatible, so you'll need to get your input to the shape that the `Conv2D` layer expects. Where did the shape `(513, 25, 1)` come from? – Aaron Keesing Sep 07 '20 at 00:23
  • Oh yes, I forgot slices the input data in that shape, thanks for the reply. – Jorge Ramón Sep 07 '20 at 17:57
  • Could you perhaps add some context as to why you are making this recommendation? How/why does converting these into `tf.data.Dataset`s solve the problem? Asking as someone a bit new to Tensorflow/Keras and I'm trying to wrap my head around this. Cheers – Danny Bullis Feb 23 '22 at 06:27
  • 1
    @DannyBullis In my experience, using the TensorFlow data pipeline minimises incompatibilties to do with converting between NumPy arrays, Python objects, and tensors, as well as being useful for manipulating data asynchronously and using multiple workers, etc. – Aaron Keesing Feb 24 '22 at 02:00
4

Basically, no matter what you define the shape of Conv2D is 2D, 3D,... it requires 4D when you feeding input X to it, where X.shape is look like this (batch,row,col,channel).

The below example here is the clarify about Conv2D

input_layer= layers.InputLayer(input_shape=(2,2,1))
conv1 = layers.Conv2D(3,(2,2))
X= np.ones((2,2))
X =X.reshape(1,X.shape[0],X.shape[1],1) # shape of X is 4D, (1, 2, 2, 1) 
conv1(input_layer(X))

TL;DR

Now let's elaborating above codes

Line 1 input_layer was defined with the shape of 3D, but at line no.4 X was reshaped to 4D shape which is not matching the shape at all. However, in order to feed any input X to input_layer or Conv2D must pass with 4D shape.

Borin --help
  • 146
  • 2
  • 6
  • 1
    "input must be 4D shape", is there a statement in the official TF/Keras doc that states this? Or this is some sort of unwritten rule that everyone must follow – akalanka Apr 06 '23 at 18:32