Is there any way to directly load an audio file (wav) to a tensor in tensorflow? And then, converting the tensor into an audio file again? I saw some people transforming audio into spectograms, but I couldn't find anyone that could convert from the spectogram to audio.
Asked
Active
Viewed 4,912 times
1 Answers
6
TensorFlow 1.x:
The tf.contrib.ffmpeg.decode_audio()
op can load audio data (including in WAV format) into a tensor, and the tf.contrib.ffmpeg.encode_audio()
can covert it back into audio data.
input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])
input_signal = tf.contrib.ffmpeg.decode_audio(
tf.read_file(input_filename), file_format="wav",
samples_per_second=44100, channel_count=2)
# ...
output_signal = ... # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
output_signal, file_format="wav", samples_per_second=44100)
write_file_op = tf.write_file(output_filename, encoded_audio_data)
with tf.Session() as sess:
sess.run(write_file_op, {input_filename: "input.wav",
output_filename: "output.wav"})
TensorFlow 2.x
The tf.contrib
module has been deprecated, but you are still able to load and save audio files in 16-bit PCM WAV format using eager execution and tf.audio
:
# Returns a tuple of Tensor objects (audio, sample_rate).
input_signal = tf.audio.decode_wav("input.wav")
# Returns a Tensor of type string.
output_signal = tf.audio.encode_wav(input_signal[0], input_signal[1])

dynamicwebpaige
- 373
- 1
- 10

mrry
- 125,488
- 26
- 399
- 400