How to read Ogg or MP3 audio files in a TensorFlow graph?

Question

I've seen image decoders like tf.image.decode_png in TensorFlow, but how about reading audio files (WAV, Ogg, MP3, etc.)? Is it possible without TFRecord?

E.g. something like this:

filename_queue = tf.train.string_input_producer(['my-audio.ogg'])
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_audio = tf.audio.decode_ogg(value)

sygi · Answer 1 · 2018-09-07T12:31:32.840

6

Yes, there are special decoders, in the package tensorflow.contrib.ffmpeg. To use it, you need to install ffmpeg first.

Example:

audio_binary = tf.read_file('song.mp3')
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='mp3', samples_per_second=44100, channel_count=2)

edited Sep 07 '18 at 12:31

answered Dec 12 '16 at 22:15

sygi

4,557
2
32
54

tf.contrib ffmpeg is not available in windows somewhy, I have ffmpeg installed via conda and it simply does not work.. [tf github issue posted on here](https://github.com/tensorflow/tensorflow/issues/8271) – Johny Vaknin Aug 16 '18 at 12:15
1

That's great! In TensorFlow 1.10.0 you need to pass the `channel_count` parameter, otherwise it will throw the following error: `ValueError: Tried to convert 'channel_count' to a tensor and failed. Error: None values not supported.` – Oriol Nieto Sep 06 '18 at 21:08

score 6 · Accepted Answer · answered Mar 05 '20 at 14:01

The answer from @sygi is unfortunately not supported in TensorFlow 2.x. An alternative solution would be to use some external library (e.g. pydub or librosa) to implement the mp3 decoding step, and integrate it in the pipeline through the use of tf.py_function. So you can do something along the lines of:

from pydub import AudioSegment
import tensorflow as tf

dataset = tf.data.Dataset.list_files('path/to/mp3s/*')

def decode_mp3(mp3_path):
    mp3_path = mp3_path.numpy().decode("utf-8")
    mp3_audio = AudioSegment.from_file(mp3_path, format="mp3")
    return mp3_audio.get_array_of_samples()

dataset = dataset.map(lambda path:
    tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))

for features in dataset.take(3):
    data = features.numpy()
    plt.plot(data)
    plt.show()

score 3 · Answer 3 · answered Mar 20 '20 at 16:54

3

Such a function has recently been added to tensorflow_io (here). You can use it like this:

content = tf.io.read_file(path)
audio = tfio.experimental.audio.decode_ogg(content)

answered Mar 20 '20 at 16:54

Albert

65,406
61
242
386

score 1 · Answer 4 · answered Aug 21 '21 at 14:04

For the latest versions of tensorflow, All audio related utilities have been moved/added to tensorflow_io (here). To install run pip install tensorflow.io

import tensorflow_io as tfio
import tensorflow as tf

fp = 'path/to/mp3'
audio  = tfio.audio.decode_mp3(tf.io.read_file(fp))

How to read Ogg or MP3 audio files in a TensorFlow graph?

4 Answers4