How to convert .wav files into a Pandas DataFrame in order to feed it to a neural network?

Question

I'm trying to feed .wav files to a neural network in order to train it to detect what's being said. So I have around 10 000 .wav files and the transcription of the audio, but when I try to feed the CSV file to the neural network I get this error : ValueError: setting an array element with a sequence.

I'm using Soundfile to get the .wav data without the header and putting it into a list. I've tried other libraries too but the result was the same.

import os
import numpy as np
from tqdm import tqdm
import pandas as pd
import soundfile as sf

path = os.getcwd() + "/stft wav/"
audios = []
total = len(os.listdir(path))
pbar = tqdm(total = total)
for file in os.listdir(path):
    data, sr = sf.read(path + file)
    audios.append(data)
    pbar.update(1)
pbar.close()

Then I read the file with the transcription and create the dataset that's going to be fed to the neural network.

dict = pd.read_csv("dictionary.csv", sep = '\t')
dataset = pd.DataFrame(columns = ['Audio', 'Word'])
dataset.Audio = audios
dataset.Word = dict.Romaji

The dataset now looks like this :

    Audio                                               Word
0   [-2.686136382767934e-11, 1.5804246800144028e-1...   inshou
1   [5.0145061436523974e-09, 1.3923349584388234e-0...   taishou
2   [-2.253151087927563e-08, 2.173326230092698e-08...   genshou
3   [3.0560468644580396e-07, 1.0646554073900916e-0...   kishou
4   [0.0, 2.499070395067804e-12, 1.206467304531999...   chuushouteki

The arrays from the audio column don't have the same size, but I already tried padding them with zeros and the error message continues the same.

This is how I padded it in case you're wondering :

X = dataset.Audio.copy()
pbar = tqdm(total = len(X['Audio']))
for i in range(0, len(X['Audio'])):
    X['Audio'][i] = np.resize(X['Audio'][i], len(max(X['Audio'], key = len)))
    pbar.update(1)
pbar.close()

A weird thing I noticed is that when I save this CSV file and read it again the Audio column's float arrays are automatically converted into string arrays. The only way I found to keep it the way it should be is saving it as a pickle file.

Since we're at it, feel free to suggest other methods to feed the .wav files to the neural network. I'm trying to use this method instead of spectrograms because I read here that it's not a good idea.

Solution

I was looking into similar problems and found a simple and elegant solution. After the train-test split, when passing the audios' column to the neural network, use list(X) instead of just X.

About the CSV file converting the float array to string, it's because of the power notation. There's a letter in the middle of the numbers, so Pandas writes it as float, but reads it as string. As I said previously, saving the dataframe as a pickle file works, but it takes too long to read compared to saving the audios' column separately as a .npy file.

You can use `librosa`. It is an excellent package for reading audio files and converting them to NumPy arrays. — Shubham Panchal, Feb 17 '19 at 01:47
I also tried with `librosa`, but the issue isn't about converting the list to a NumPy array because when I append the list to a DataFrame this is done automatically. — Victor Almeida, Feb 18 '19 at 16:33

score 1 · Answer 1 · answered Mar 22 '19 at 20:15

Looks like you already solved this, but here are a couple of other items that it looks like haven't been mentioned. First, wave is a Python utility that was included in my Py3.6 install.

https://docs.python.org/3/library/wave.html

This code is (sorta) stolen from here:

from wave import open as open_wave
waveFile = open_wave(<filename>,'rb')
nframes = waveFile.getnframes()
wavFrames = waveFile.readframes(nframes)
ys = numpy.fromstring(wavFrames, dtype=numpy.int16)

That should enable you to put your data into a DF pretty easily, which appears to be the main item you're asking about based on your thread title.

Lastly, regarding your DF issues with dtypes, note that the DataFrame invocation has a dtype forcing option that I have used in situations like the one you find yourself in.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

How to convert .wav files into a Pandas DataFrame in order to feed it to a neural network?

1 Answers1

Linked