2

I'm working on application that should predict interesting moments in 10 sec audio files. I divided audio on 50ms chunks and extracted notes, so I have 200 notes for each example. When I add convolutional layer it returns an error:

ValueError: Input 0 of layer conv1d_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 200]

Here is my code:

def get_dataset(file_path):
  dataset = tf.data.experimental.make_csv_dataset(
      file_path,
      batch_size=12,
      label_name='label',
      na_value='?',
      num_epochs=1,
      ignore_errors=False)
  return dataset

train = get_dataset('/content/gdrive/My Drive/MyProject/train.csv')
test = get_dataset('/content/gdrive/My Drive/MyProject/TestData/manual.csv')
feature_columns = []

for number in range(200):
  feature_columns.append(tf.feature_column.numeric_column('note' + str(number + 1) ))

preprocessing_layer = tf.keras.layers.DenseFeatures(feature_columns)

model = tf.keras.Sequential([
    preprocessing_layer,
    tf.keras.layers.Conv1D(32, 3, padding='same', activation=tf.nn.relu, input_shape=[None, 200]),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(50, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy'])
model.fit(train, epochs=20)

What causes this problem and how can it be fixed?

Alex Zaitsev
  • 2,544
  • 5
  • 18
  • 27

1 Answers1

5

The 1D convolution over sequences expects a 3D input. In other words, for each element in the batch, for each time step, a single vector. Consider the following:

X = tf.random.normal([10, 200])
convolved = tf.keras.layers.Conv1D(32, 3, padding='same', activation=tf.nn.relu, input_shape=[None, 200])
print(convolved(X))

This throws an error:

ValueError: Input 0 of layer conv1d_3 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [10, 200]

However, If we provide for each of the 10 batch samples, for each of the 5 time steps, a 200 dimensional vector:

X = tf.random.normal([10, 5, 200])
convolved = tf.keras.layers.Conv1D(32, 3, padding='same', activation=tf.nn.relu, input_shape=[None, 200])
print(convolved(X)

This works as it should. Therefore, in your case, for each audio file, for each second (depends on how you sample the data), you will have a single vector.

gorjan
  • 5,405
  • 2
  • 20
  • 40
  • 1
    @gorjian , thanks for the answer. Could you please explain how to make correct dimmensions for my dataset if I want to use convolution? This how looks it now – Alex Zaitsev Aug 09 '19 at 19:12
  • You need to perform sampling on the data, before creating `tf.data.experimental.make_csv_dataset`. In order to prepare the data for a shape adequate for Conv1D, you need to have a 3D shape data `[batch_size, time_steps, feature_size]`. – gorjan Aug 10 '19 at 00:03