Combining wav files with space between in Python

Question

I'm looking to combine wave files in Python with silence between them. The code here using wave works perfectly well:

But I need to put some space between the files so that different pairs of files are spaced the same way. I've got this code to figure out the space needed between wavs to make the total 10s:

import wave

file_1 = wave.open('file_1.wav')
file_2 = wave.open('file_2.wav')

total_length = 0
for item in [file_1,file_2]:
    item_length = item.getnframes()/item.getframerate()
    total_length = total_length + item_length
space_between = 10-total_length

So now I need to know if there's a way to use the space_between variable I've created in the code linked above to space apart my two files when combining them. I've seen some scripts for generating waveforms of different types and I've figured out how to make them essentially silent but I can't specify a length, or at least can't figure out how to. Any ideas?

EDIT: I was able to find a way to do this by combining the above code with the code here given to make silent files of a specified length: python - how can I generate a WAV file with beeps?

Thanks folks!

By "space between", do you mean you want to insert silence between them? Assuming these are PCM files, silence is 0. Just add N frames of 0. — Tim Roberts, Jul 27 '21 at 19:18
So you'd multiply space_between by the frame rate. Of course, if the wav files have different frame rates or are longer than 10 seconds combined, you have issues. — Garr Godfrey, Jul 27 '21 at 19:22
Right, it makes sense that I'm doing that much - but where am I generating this silence exactly? I've got several options for how to combine the wav files, but so far I'm using this: https://stackoverflow.com/a/2900266/14132599 So at what point do I say "add N frames of 0"? — Michael Clauss, Jul 27 '21 at 19:44

CLipp · Answer 1 · 2021-07-27T20:01:56.010

You first need to understand the RIFF header. You will want to open your wavs as binary data.

The first 4 bytes of the header are the 'magic number'. In this case, 52 49 46 46, or RIFF. This is in big-endian(BE) format.

The next 4 bytes are the file size minus the header. This is in little-endian(LE) format, so read the bytes backwards from position 8. This will obviously vary based on the size of the wav file.

The next 4 bytes is BE and will always be 57 41 56 45, or WAVE because it's a wav file.

The next 4 bytes is BE and will always be 66 6D 74 20, or fmt . That's a fmt with a space after. From here things become more variable based on the wave and having the correct matching values is critical to this being somewhat easy.

The chunk size is 4 bytes LE, this will likely be 10 00 00 00 or 28 00 00 00. The chunk is used to let the audio processor determine how to read the actual audio data.

The next 2 bytes are LE and determine the format, 10 00 means PCM, or pulse code modulation, this is usually set via the recording software and shouldn't be changed.

The next 2 bytes are LE and determine if the audio file is mono (01 00) or stereo (02 00).

The next 4 bytes are BE and determine the sample rate. This is highly variable but must match for proper reading. For instance, 44 AC 00 00 would be 44,100 as in 44,100 hz.

The next 4 byte are BE and determine the transfer rate. This can be calculated by taking the channels x sample rate x bit / 8.

The next 2 bytes are LE and are for alignment, this can be calculated by taking channels x bits /8.

The next 2 bytes are LE and determine the bits per sample. 08 00 means 1 sample takes 1 byte.

The next 4 bytes are BE and end the header, this is always 64 61 74 61 or data.

Everything following this is audio data. Here you would split the files, combine them, insert the amount of blank data x sample rate, then read the entire length and create a new header.

Take the following header from a random wav file I have for example:

52 49 46 46 14 60 28 00 57 41 56 45 66 6D 74 20
10 00 00 00 01 00 01 00 22 56 00 00 44 AC 00 00
02 00 10 00 64 61 74 61

We see the RIFF, file size of 28 60 14 (flipped endian) or 2,646,036 where the file is 2,646,044, so you see the 8-byte difference, last on the top line we see the WAVEfmt . Following that is 10 00 00 00, this tells us it is 16 bit. 01 00 tells us it is PCM. 01 00 tells us it is mono channel. The 56 22 00 00 (flipped endian) tells us this file is 22,050 sample rate per second. AC 44 00 00 tell us it is 44,100 hz transfer rate in bits per second. The last line is fairly straightforward, ending is data.

So we can determine the length of audio by diving the byte length, 2646036 by the bit rate, 44100. This gives us 60.0008 or 1 minute, which this audio file is.

Combining wav files with space between in Python

1 Answers1