12

I've seen image decoders like tf.image.decode_png in TensorFlow, but how about reading audio files (WAV, Ogg, MP3, etc.)? Is it possible without TFRecord?

E.g. something like this:

filename_queue = tf.train.string_input_producer(['my-audio.ogg'])
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_audio = tf.audio.decode_ogg(value)
Matthias Winkelmann
  • 15,870
  • 7
  • 64
  • 76
Carl Thomé
  • 2,703
  • 3
  • 19
  • 41

4 Answers4

6

Yes, there are special decoders, in the package tensorflow.contrib.ffmpeg. To use it, you need to install ffmpeg first.

Example:

audio_binary = tf.read_file('song.mp3')
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='mp3', samples_per_second=44100, channel_count=2)
sygi
  • 4,557
  • 2
  • 32
  • 54
  • tf.contrib ffmpeg is not available in windows somewhy, I have ffmpeg installed via conda and it simply does not work.. [tf github issue posted on here](https://github.com/tensorflow/tensorflow/issues/8271) – Johny Vaknin Aug 16 '18 at 12:15
  • 1
    That's great! In TensorFlow 1.10.0 you need to pass the `channel_count` parameter, otherwise it will throw the following error: `ValueError: Tried to convert 'channel_count' to a tensor and failed. Error: None values not supported.` – Oriol Nieto Sep 06 '18 at 21:08
6

The answer from @sygi is unfortunately not supported in TensorFlow 2.x. An alternative solution would be to use some external library (e.g. pydub or librosa) to implement the mp3 decoding step, and integrate it in the pipeline through the use of tf.py_function. So you can do something along the lines of:

from pydub import AudioSegment
import tensorflow as tf

dataset = tf.data.Dataset.list_files('path/to/mp3s/*')

def decode_mp3(mp3_path):
    mp3_path = mp3_path.numpy().decode("utf-8")
    mp3_audio = AudioSegment.from_file(mp3_path, format="mp3")
    return mp3_audio.get_array_of_samples()

dataset = dataset.map(lambda path:
    tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))

for features in dataset.take(3):
    data = features.numpy()
    plt.plot(data)
    plt.show()

enter image description here

dsalaj
  • 2,857
  • 4
  • 34
  • 43
3

Such a function has recently been added to tensorflow_io (here). You can use it like this:

content = tf.io.read_file(path)
audio = tfio.experimental.audio.decode_ogg(content)
Albert
  • 65,406
  • 61
  • 242
  • 386
1

For the latest versions of tensorflow, All audio related utilities have been moved/added to tensorflow_io (here). To install run pip install tensorflow.io

import tensorflow_io as tfio
import tensorflow as tf

fp = 'path/to/mp3'
audio  = tfio.audio.decode_mp3(tf.io.read_file(fp))
Asrst
  • 159
  • 7