From audio to tensor, back to audio in tensorflow

Question

Is there any way to directly load an audio file (wav) to a tensor in tensorflow? And then, converting the tensor into an audio file again? I saw some people transforming audio into spectograms, but I couldn't find anyone that could convert from the spectogram to audio.

score 6 · Answer 1 · edited Dec 30 '19 at 12:45

TensorFlow 1.x:

The tf.contrib.ffmpeg.decode_audio() op can load audio data (including in WAV format) into a tensor, and the tf.contrib.ffmpeg.encode_audio() can covert it back into audio data.

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})

TensorFlow 2.x

The tf.contrib module has been deprecated, but you are still able to load and save audio files in 16-bit PCM WAV format using eager execution and tf.audio:

# Returns a tuple of Tensor objects (audio, sample_rate).
input_signal = tf.audio.decode_wav("input.wav")

# Returns a Tensor of type string.
output_signal = tf.audio.encode_wav(input_signal[0], input_signal[1])

From audio to tensor, back to audio in tensorflow

1 Answers1

TensorFlow 1.x:

TensorFlow 2.x

Linked