5

I want to know how to get samples out of a .wav file in order to perform windowed join of two .wav files.

Can any one please tell how to do this?

MaxPowers
  • 5,235
  • 2
  • 44
  • 69
kaki
  • 1,043
  • 5
  • 14
  • 20

4 Answers4

13

The wave module of the standard library is the key: after of course import wave at the top of your code, wave.open('the.wav', 'r') returns a "wave read" object from which you can read frames with the .readframes method, which returns a string of bytes which are the samples... in whatever format the wave file has them (you can determine the two parameters relevant to decomposing frames into samples with the .getnchannels method for the number of channels, and .getsampwidth for the number of bytes per sample).

The best way to turn the string of bytes into a sequence of numeric values is with the array module, and a type of (respectively) 'B', 'H', 'L' for 1, 2, 4 bytes per sample (on a 32-bit build of Python; you can use the itemsize value of your array object to double-check this). If you have different sample widths than array can provide you, you'll need to slice up the byte string (padding each little slice appropriately with bytes worth 0) and use the struct module instead (but that's clunkier and slower, so use array instead if you can).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • when i try .getsamplewidth it gave me a value 2 meaning that 2 bytes.. when i try .readframes(1) should return 1 frame then it returned for me such as " /x03/x16 " which i guess is 2 bytes,so does it means that 1 frame has only 1 sample.. what is use getnchannels ?? i want to take samples from each frame separetely and represent them in intergers,how can i ?? – kaki Jun 17 '10 at 06:42
  • 1
    @kaki, in each frame, there is the first sample from each channel, then the second sample from each channel, then so on. So unless your sound is mono i.e. just 1 channel you have to decide what to do ith the channels (skip all but one, average them, whatever). Say it's 1 channel (mono), simplest, then `x = array.array('h', w.getframes(1))` gives you in `x` an array with all the samples of the first frame (next one, if in a loop) as integers, just as you say you want (`h`, not `H`: they're signed). If stereo, 2 channels, even indices of `x` have e.g. left channel samples. Little-endian btw. – Alex Martelli Jun 17 '10 at 14:17
  • BTW, the format docs at https://ccrma.stanford.edu/courses/422/projects/WaveFormat/ do not use the concept of "frames" but rather "chunks" and "subchunks", but in the end it comes to much the same thing of course;-). – Alex Martelli Jun 17 '10 at 14:20
  • @kaki, you're welcome -- but do consider accepting the answer that has helped you (by clicking on the checkmark outline on the Q's left), as that is really fundamental SO etiquette! – Alex Martelli Jun 18 '10 at 14:17
  • @AlexMartelli Sorry for grave digging, but I'm trying to use .readframes(1) and I need it to split the two samples into two separate variables (basically put the first left sample in variable X, and the first right sample into variable Y), but I don't have a clue how to do this, any suggestions? – MarcusJ Apr 12 '14 at 10:19
2

You can use the wave module. First you should read the metadata, such us sample size or the number of channels. Using the readframes() method, you can read samples, but only as a byte string. Based on the sample format, you have to convert them to samples using struct.unpack().

Alternatively, if you want the samples as an array of floating-point numbers, you can use SciPy's io.wavfile module.

Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
  • can u tell me how to get sample as an array of floatinf point numbers without using scipy – kaki Jun 17 '10 at 06:59
2

Here's a function to read samples from a wave file (tested with mono & stereo):

def read_samples(wave_file, nb_frames):
    frame_data = wave_file.readframes(nb_frames)
    if frame_data:
        sample_width = wave_file.getsampwidth()
        nb_samples = len(frame_data) // sample_width
        format = {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples
        return struct.unpack(format, frame_data)
    else:
        return ()

And here's the full script that does windowed mixing or concatenating of multiple .wav files. All input files need to have the same params (# of channels and sample width).

import argparse
import itertools
import struct
import sys
import wave

def _struct_format(sample_width, nb_samples):
    return {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples

def _mix_samples(samples):
    return sum(samples)//len(samples)

def read_samples(wave_file, nb_frames):
    frame_data = wave_file.readframes(nb_frames)
    if frame_data:
        sample_width = wave_file.getsampwidth()
        nb_samples = len(frame_data) // sample_width
        format = _struct_format(sample_width, nb_samples)
        return struct.unpack(format, frame_data)
    else:
        return ()

def write_samples(wave_file, samples, sample_width):
    format = _struct_format(sample_width, len(samples))
    frame_data = struct.pack(format, *samples)
    wave_file.writeframes(frame_data)

def compatible_input_wave_files(input_wave_files):
    nchannels, sampwidth, framerate, nframes, comptype, compname = input_wave_files[0].getparams()
    for input_wave_file in input_wave_files[1:]:
        nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
        if (nc,sw,fr,ct,cn) != (nchannels, sampwidth, framerate, comptype, compname):
            return False
    return True

def mix_wave_files(output_wave_file, input_wave_files, buffer_size):
    output_wave_file.setparams(input_wave_files[0].getparams())
    sampwidth = input_wave_files[0].getsampwidth()
    max_nb_frames = max([input_wave_file.getnframes() for input_wave_file in input_wave_files])
    for frame_window in xrange(max_nb_frames // buffer_size + 1):
        all_samples = [read_samples(wave_file, buffer_size) for wave_file in input_wave_files]
        mixed_samples = [_mix_samples(samples) for samples in itertools.izip_longest(*all_samples, fillvalue=0)]
        write_samples(output_wave_file, mixed_samples, sampwidth)

def concatenate_wave_files(output_wave_file, input_wave_files, buffer_size):
    output_wave_file.setparams(input_wave_files[0].getparams())
    sampwidth = input_wave_files[0].getsampwidth()
    for input_wave_file in input_wave_files:
        nb_frames = input_wave_file.getnframes()
        for frame_window in xrange(nb_frames // buffer_size + 1):
            samples = read_samples(input_wave_file, buffer_size)
            if samples:
                write_samples(output_wave_file, samples, sampwidth)

def argument_parser():
    parser = argparse.ArgumentParser(description='Mix or concatenate multiple .wav files')
    parser.add_argument('command', choices = ("mix", "concat"), help='command')
    parser.add_argument('output_file', help='ouput .wav file')
    parser.add_argument('input_files', metavar="input_file", help='input .wav files', nargs="+")
    parser.add_argument('--buffer_size', type=int, help='nb of frames to read per iteration', default=1000)
    return parser

if __name__ == '__main__':
    args = argument_parser().parse_args()

    input_wave_files = [wave.open(name,"rb") for name in args.input_files]
    if not compatible_input_wave_files(input_wave_files):
        print "ERROR: mixed wave files must have the same params."
        sys.exit(2)

    output_wave_file = wave.open(args.output_file, "wb")
    if args.command == "mix":
        mix_wave_files(output_wave_file, input_wave_files, args.buffer_size)
    elif args.command == "concat":
        concatenate_wave_files(output_wave_file, input_wave_files, args.buffer_size)

    output_wave_file.close()
    for input_wave_file in input_wave_files:
        input_wave_file.close()
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
0

After reading the samples (for example with the wave module, more details here) you may want to have the values scales between -1 and 1 (this is the convention for audio signals).

In this case, you can add:

# scale to -1.0 -- 1.0
max_nb_bit = float(2**(nb_bits-1))  
samples = signal_int / (max_nb_bit + 1.0) 

with nb_bits the bit depth and signal_int the integers values.

PatriceG
  • 3,851
  • 5
  • 28
  • 43