1

I am faced with the task of classifying sound by spectrograms. I have a solution to this problem in one way (I will convert all audio recordings into spectrograms -> save them as pictures and train a neural network for this), but I want to go the simpler way, that is, not save pictures, but immediately convert audio files into tensors, but there is a problem, I can't find any useful information on how to create my data set from tensors in TensorFlow. I will give an example of such code on Pytorch.

class SoundDataset(Dataset):
  def __init__(self, file_names, labels):
    self.file_names = file_names
    self.labels = labels
  def __getitem__(self,index):
    #format the file path and load the file
    path = self.file_names[index]
    scale, sr = librosa.load(path)
    filter_banks = librosa.filters.mel(n_fft=2048, sr=22050, n_mels=10)
    mel_spectrogram = librosa.feature.melspectrogram(scale, sr=sr, n_fft=2048, hop_length=512, n_mels=32)
    log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
    trch = torch.from_numpy(log_mel_spectrogram)
    if log_mel_spectrogram.shape !=(10,87):
      delta = 87 - log_mel_spectrogram.shape[1]
      trch = torch.nn.functional.pad(trch, (0,delta))

    return trch,self.labels[index]
  def __len__(self):
    return len(self.file_names)

Here a class is being created that takes paths to audio recordings and converts them into tensors, and will pad zeros if the tensors do not fit the shapes. How can I create the same class for TensorFlow. Next is an example of code that creates tuples with file paths and their class and creates an object of the Sound Data set class and generates a dataset from these files accordingly. All this is written for Pytorch. Tell me how it can be implemented for TensorFlow.

path = '/content/drive/MyDrive/МДМА/audiodata/for-rerecorded/training/'
files = []
labels = []
lbl = '1 0'.split()
for lab in lbl:
  if lab == '0': 
    c = 'fake' 
  else: 
    c ='real'
  names = os.listdir(path+c)
  for n in names:
    pth = path+c+'/'+n
    files.append(pth)
    labels.append(int(lab))
train_dataset = SoundDataset(files, labels)
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size = 20)
Mohan Radhakrishnan
  • 3,002
  • 5
  • 28
  • 42

1 Answers1

0

If you read the documentation there are code patterns.

This is not tested but if you load the index from another data structure which has mapped the files to the indexes then this code can help.

import librosa
import pathlib
import tensorflow as tf

DATASET_PATH = 'data/mini_speech_commands'

data_dir = pathlib.Path(DATASET_PATH)
if not data_dir.exists():
  tf.keras.utils.get_file(
      'mini_speech_commands.zip',
      origin="http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip",
      extract=True,
      cache_dir='.', cache_subdir='data')


def load_audio(filename):
    scale, sr = librosa.load(filename)
    mel_spectrogram = librosa.feature.melspectrogram(scale, sr=sr, n_fft=2048, hop_length=512, n_mels=32)
    log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
    spectrogram_numpy = log_mel_spectrogram.numpy()
    if log_mel_spectrogram.shape !=(10,87):
      delta = 87 - log_mel_spectrogram.shape[1]
      spectrogram_numpy = tf.pad(spectrogram_numpy, (0,delta))
    return spectrogram_numpy #return index

read_audio = lambda x: tf.py_function(load_audio,
                                           [x],
                                           tf.float64)
filenames = tf.io.gfile.glob(str(data_dir) + '/*/*')
files_ds = tf.data.Dataset.from_tensor_slices(filenames)

waveform_ds = files_ds.map(
    map_func=read_audio)

The code to pad is converted using TensorFlow directly and you have to test it.

Update : Another way using keras.utils.Sequence is shown in this thread

Mohan Radhakrishnan
  • 3,002
  • 5
  • 28
  • 42