5

I have a list of .wav files in binary format (they are coming from a websocket), which I want to join in a single binary .wav file to then do speech recognition with it. I have been able to make it work with the following code:

audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]

# Join wav files
with wave.open('/tmp/input.wav', 'wb') as temp_input:
    params_set = False
    for audio_file in audio:
        with wave.open(audio_file, 'rb') as w:
            if not params_set:
                temp_input.setparams(w.getparams())
                params_set = True
            temp_input.writeframes(w.readframes(w.getnframes()))

# Do speech recognition
binary_audio = open('/tmp/input.wav', 'rb').read())
ASR(binary_audio)

The problem is that I don't want to write the file '/tmp/input.wav' in disk. Is there any way to do it without writing any file in the disk?

Thanks.

  • Sound can be represented as 1D array when mono, 2d as stereo. Use something like `wavefile` to get the raw data. – Josef Korbel May 24 '18 at 20:29
  • 1
    `wave.open` accepts either a file path or a file like object. you've already imported `BytesIO` so just use one of those as a file like buffer. [Here's](https://stackoverflow.com/questions/26879981/writing-then-reading-in-memory-bytes-bytesio-gives-a-blank-result) and example of someone doing basically just that with `gzip` (note the slightly different argument names). – Aaron May 24 '18 at 20:32

2 Answers2

5

The general solution for having a file but never putting it to disk is a stream. For this we use the io library which is the default library for working with in-memory streams. You even already use BytesIO earlier in your code it seems.

audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]

# Join wav files

params_set = False
temp_file = io.BytesIO()
with wave.open(temp_file, 'wb') as temp_input:
    for audio_file in audio:
        with wave.open(audio_file, 'rb') as w:
            if not params_set:
                temp_input.setparams(w.getparams())
                params_set = True
            temp_input.writeframes(w.readframes(w.getnframes()))

#move the cursor back to the beginning of the "file"
temp_file.seek(0)
# Do speech recognition
binary_audio = temp_file.read()
ASR(binary_audio)

note I don't have any .wav files to try this out on. It's up to the wave library to handle the difference between real files and buffered streams properly.

Aaron
  • 10,133
  • 1
  • 24
  • 40
  • Thanks, This works! I had tried it before but i was missing the `temp_file.seek(0)` statement, so I was just reading an empty binary object then. – Iñigo Casanueva May 25 '18 at 08:33
0

With scipy and numpy you can read the wav files as numpy arrays and than do the modifications you want.

from scipy.io import wavfile
import numpy as np

# load files
_, arr1 = wavfile.read('song.wav')
_, arr2 = wavfile.read('Aaron_Copland-Quiet_City.wav')

print(arr1.shape)
print(arr2.shape)

>>> (1323001,)
>>> (1323000,)

# make new array by concatenating two audio waves
new_arr = np.hstack((arr1, arr2))
print(new_arr.shape)

>>> (2646001,)

# save new audio wave
wavfile.write('new_audio.wav')
ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • This works, but adding a dependency to scipy and/or numpy seems overkill. As @Aaron pointed out in his answer, you can simply write to file-like objects like `BytesIO`. – Samuel Dion-Girardeau May 24 '18 at 21:17