Python3 modifying wav audio data correctly

Question

Learning how to modify different types of audio files, .wav, .mp3, etcetera using Python3 using the wave module. Specifically .wav file format, in this regard for this question. Presently, I know there are ISO standards for audio formats, and any references for this subject are greatly appreciated regarding audio standards for the .wav file format as well on a side note.

But in terms of my question, simply ignoring the RIFF, FMT headers, in a .wav file using the Python3 wave module import.

Is there a more efficient way to skip the RIFF headers, other containers, and go straight to the data container to modify its contents?

This crude example simply is converting a two-channel audio .wav file to a single-channel audio .wav file while modifying all values to (0, 0).

import wave
import struct

# Open Files
inf = wave.open(r"piano2.wav", 'rb')
outf = wave.open(r"output.wav", 'wb')

# Input Parameters
ip = list(inf.getparams())
print('Input Parameters:', ip)
# Example Output: Input Parameters: [2, 2, 48000, 302712, 'NONE', 'not compressed']

# Output Parameters
op = ip[:]
op[0] = 1
outf.setparams(op)

number_of_channels, sample_width, frame_rate, number_of_frames, comp_type, comp_name = ip

format = '<{}h'.format(number_of_channels)
print('# Channels:', format)

# Read >> Second
for index in range(number_of_frames):
    frame = inf.readframes(1)
    data = struct.unpack(format, frame)

    # Here, I change data to (0, 0), testing purposes
    print('Before Audio Data:', data)
    print('After Modifying Audio Data', (0, 0))

    # Change Audio Data
    data = (0, 0)

    value = data[0]
    value = (value * 2) // 3
    outf.writeframes(struct.pack('<h', value))

# Close In File
inf.close()
# Close Out File
outf.close()

Is there a better practice or reference material if simply just modifying data segments of .wav files?

Say you wanted to literally add a sound at a specific timestamp, that would be a more appropriate result to my question.

Lukasz Tracewski · Accepted Answer · 2021-05-09T09:16:41.260

Performance comparison

Let's examine first 3 ways to read WAVE files.

The slowest one - wave module

As you might have noticed already, wave module can be painfully slow. Consider this code:

import wave
import struct

wavefile = wave.open('your.wav', 'r') # check e.g. freesound.org for samples

length = wavefile.getnframes()
for i in range(0, length):
    wavedata = wavefile.readframes(1)
    data = struct.unpack("<h", wavedata)

For a WAVE as defined below:

Input File     : 'audio.wav'
Channels       : 1
Sample Rate    : 48000
Precision      : 16-bit
Duration       : 00:09:35.71 = 27634080 samples ~ 43178.2 CDDA sectors
File Size      : 55.3M
Bit Rate       : 768k
Sample Encoding: 16-bit Signed Integer PCM

it took on average 27.7s to load the full audio. The flip side to the wave module it is that is available out of the box and will work on any system.

The convenient one - audiofile

A much more convenient and faster solution is e.g. audiofile. According to the project description, its focus is on reading speed.

import audiofile as af

signal, sampling_rate = af.read(audio.wav)

This gave me on average 33 ms to read the mentioned file.

The fastest one - numpy

If we decide to skip header (as OP asks) and go solely for speed, numpy is a great choice:

import numpy as np

byte_length = np.fromfile(filename, dtype=np.int32, count=1, offset=40)[0]
data = np.fromfile(filename, dtype=np.int16, count=byte_length // np.dtype(np.int16).itemsize, offset=44)

The header structure (that tells us what offset to use) is defined here.

The execution of that code takes ~6 ms, 5x less than the audioread. Naturally it comes with a price / preconditions: we need to know in advance what is the data type.

Modifying the audio

Once you have the audio in a numpy array, you can modify it at will, you can also decide to stream the file rather than reading everything at once. Be warned though: since sound is a wave, in a typical scenario simply injecting new data at arbitrary time t will lead to distortion of that audio (unless it was silence).

As for writing the stream back, "modifying the container" would be terribly slow in Python. That's why you should either use arrays or switch to a more suitable language (e.g. C).

If we go with arrays, we should mind that numpy knows nothing about the WAVE format and therefore we'd have to define the header ourselves and write individual bytes. Perfectly feasible exercise, but clunky. Luckily, scipy provides a convenient function that has the benefits of numpy speed (it uses numpy underneath), while making the code much more readable:

from scipy.io.wavfile import write

fs = np.fromfile('audio.wav', dtype=np.int32, count=1, offset=24)[0] # we need sample rate

with open('audio_out.wav', 'a') as fout:
    new_data = data.append(np.zeros(2 * fs)) # append 2 seconds of zeros
    write(fout, fs, new_data)

It could be done in a loop, where you read a chunk with numpy / scipy, modify the array (data) and write to the file (with a for append).

I am approving your post as very informative and answered my question. I would love it if you might post an example writing bytes using the `numpy` module with an array and `audioread`. This is the part I am having difficulty with, since the byte type's, and formatting being new to Python. But thanks for this informative post. Just replacing the bytes a simple example, if it's not much trouble. If not no worries, thanks for the detailed post. — ABC, May 09 '21 at 08:37
Thanks, and I get what you need. Mind though that writing audio with `numpy` is clunky (this post would get 2x longer). `numpy` know nothing about WAVE, which means we'd have define the header. In this case I'd use [scipy write](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html), which is a wrapper over numpy that has the required functionality and works on `numpy` arrays. What do you think? — Lukasz Tracewski, May 09 '21 at 08:43
@Luckasz I appreciate what you have done, and will research scipy_write. As I think you get the gist as it does get clunky which becomes confusing. But, nonetheless. Thanks for everything. Been a great help and informative post. Thanks again. Also, thanks for providing faster time alternatives, will be very helpful in the future. I understand you would have to write a `struct` for the containers of the **.wav** files. — ABC, May 09 '21 at 08:45
I hope the extra explanation will be helpful, Sure, you can do what you just described, but I am reasonably certain this will make your code slow and complex (that is, if you go with your own `struct`). `SciPy` should be great here and there is no penalty over using it instead of pure `numpy`. Good luck! — Lukasz Tracewski, May 09 '21 at 09:20
Great answer and sample, seem's `data.append` will need to be replaced. As it is no longer working, but else seems to be in line perfectly, from what I can read. — ABC, May 11 '21 at 03:14