Import wav file in Tensorflow 2

Question

Using Python 3.7 and Tensorflow 2.0, I'm having a hard time reading wav files from the UrbanSounds dataset. This question and answer are helpful because they explain that the input has to be a string tensor, but it seems to be having a hard time getting past the initial metadata encoded in the file, and getting to the real data. Do I have to preprocess the string before being able to load it as a float32 tensor? I already had to preprocess the data by downsampling it from 24-bit wav to 16-bit wav, so the data-input pipeline is turning out to be much more cumbersome than I would have expected. The required downsampling is particularly frustrating. Here's what I'm trying so far:

import tensorflow as tf  # this is TensorFlow 2.0

path_to_wav_file = '/mnt/d/Code/UrbanSounds/audio/fold1/101415-3-0-2.wav'
# Turn the wav file into a string tensor
input_data = tf.io.read_file(path_to_wav_file)
# Convert the string tensor to a float32 tensor
audio, sampling_rate = tf.audio.decode_wav(input_data)

This is the error I get at the last step:

2019-10-08 20:56:09.124254: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected fmt  but found junk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/gen_audio_ops.py", line 216, in decode_wav
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Header mismatch: Expected fmt  but found junk [Op:DecodeWav]

And here is the beginning of that string tensor. I'm no expert on wav files, but I think the part after "fmt" is where the actual audio data starts. Before that I think it's all metadata about the file.

data.numpy()[:70]
b'RIFFhb\x05\x00WAVEjunk\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00fmt \x10\x00\x00\x00\x01\x00\x01\x00D\xac\x00\x00\x88X\x01\x00\x02\x00'

For test purpose, did you try to remove the "junk" header manually from the data (from `junk` up to the byte before `fmt`) to see if it's working? My guess is their decoder is pretty basic and can't fully handle the [RIFF](https://en.m.wikipedia.org/wiki/Resource_Interchange_File_Format)/WAVE format. — Matthieu, Oct 11 '19 at 04:01
And did you change the header length accordingly (The `hb\x05\x00` part)? — Matthieu, Oct 11 '19 at 04:08
From [the format description](https://en.m.wikipedia.org/wiki/Interchange_File_Format), it represents the chunk length in big endian so if you remove the 32 bytes of the `junk` chunk, you'd have to substract it from its top-chunk, which size is `0x056268` (the `hb\x05\x00` part) so you'd need `RIFFHb\x05\x00` if my calculation is correct (but please double-check it :)) — Matthieu, Oct 11 '19 at 04:19
I'll give that a try but this can't possibly be the way we're meant to import audio files in TensorFlow. — Alex, Oct 11 '19 at 04:23
It's actually 36 bytes to remove, so `Db\x05\x00`, sorry about that. — Matthieu, Oct 11 '19 at 04:26
Of course not, but it's just to understand if their decoder is broken and you need to convert the data to a format it accepts (with an appropriate tool). — Matthieu, Oct 11 '19 at 04:27
I have the files on my computer but I can't find them online anymore. Seeing it work with any file would be fine though. So if you can show how to process any wav file, that would be good enough. There are just no examples in the tensorflow docs. — Alex, Oct 15 '19 at 12:35
Unfortunately I suspect the root problem comes from the `junk` chunk that is in the file. I tried with the WindowsStart.wav and also patched it to include a `junk` chunk but both are working fine with the code you gave, but I'm using tensorflow 1.14.0. Can you try with that version also (1.14.0) or are you "stuck" with 2.0? — Matthieu, Oct 15 '19 at 14:09
I think the reason this appears to work in tensorflow 1.14 is that tf1 uses lazy execution, so you can run audio, sampling_rate = tf.audio.decode_wav(input_data) without getting an error, but seems to be just because it hasn't actually executed the code yet. — Alex, Oct 16 '19 at 01:22

devnull · Accepted Answer · 2019-10-18T10:17:04.577

8

It seems like your error has to do with TensorFlow expecting the fmt part as the beginning.

The code of TensorFlow for the processing can be found here: https://github.com/tensorflow/tensorflow/blob/c9cd1784bf287543d89593ca1432170cdbf694de/tensorflow/core/lib/wav/wav_io.cc#L225

There's also an open issue, awaiting response from TensorFlow's team which roughly covers the same error you've provided. https://github.com/tensorflow/tensorflow/issues/32382

Other libraries just skip the Junk part, so it works with them.

edited Oct 18 '19 at 10:17

answered Oct 18 '19 at 02:41

devnull

430
2
10

Your answer is that this is a bug, and unfortunately you seem to be correct. The links are helpful. – Alex Oct 18 '19 at 04:54

score 5 · Answer 2 · answered Oct 16 '19 at 09:22

5

It seems that your code fails for dual channel audio file. The code works for mono channel wav file. In your case you can try using scipy.

from scipy.io import wavfile as wav
sampling_rate, data =  wav.read('101415-3-0-2.wav')

answered Oct 16 '19 at 09:22

ravikt

952
6
15

This is helpful because it provides an alternative, but it doesn't answer the question in the sense that there is presumably a way to do this entirely within the tensorflow library. – Alex Oct 17 '19 at 04:16
@Alex but is the problematic .wav file failing with tensorflow working with scipy? – Matthieu Oct 17 '19 at 08:07
1

@Matthieu The problematic wav file works with other libraries but not with TensorFlow's decode_wav operator. @ravikt Are you sure that the header of your mono wav file that you tested contains `junk` in the header part and not `fmt` ? – Kautham Krishna Oct 31 '19 at 07:10

Import wav file in Tensorflow 2

2 Answers2