6

This caused me a day's worth of headache, but since I've figured it out I wanted to post it somewhere in case it's helpful.

I am using python's wave module to write data to a wave file. I'm NOT using scipy.io.wavfile because the data can be a huge vector (hours of audio at 16kHz) that I don't want to / can't load into memory all at once. My understanding is that scipy.io.wavfile only gives you full-file interface, while wave can allow you to read and write in buffers. I'd love to be corrected on that if I'm wrong.

The problem I was running into comes down to how to convert the float data into bytes for the wave.writeframes function. My data were not being written in the correct order. This is because I was using the numpy.getbuffer() function to convert the data into bytes, which does not respect the orientation of the data:

x0 = np.array([[0,1],[2,3],[4,5]],dtype='int8')
x1 = np.array([[0,2,4],[1,3,5]],dtype='int8').transpose()
if np.array_equal(x0, x1):
    print "Data are equal"
else:
    print "Data are not equal"
b0 = np.getbuffer(x0)
b1 = np.getbuffer(x1)

result:

Data are equal

In [453]: [b for b in b0]
Out[453]: ['\x00', '\x01', '\x02', '\x03', '\x04', '\x05']

In [454]: [b for b in b1]
Out[454]: ['\x00', '\x02', '\x04', '\x01', '\x03', '\x05']

I assume the order of bytes is determined by the initial allocation in memory, as numpy.transpose() does not rewrite data but just returns a view. However since this fact is buried by the interface to numpy arrays, debugging this before knowing that this was the issue was a doozy.

A solution is to use numpy's tostring() function:

s0 = x0.tostring()
s1 = x1.tostring()
In [455]: s0
Out[455]: '\x00\x01\x02\x03\x04\x05'

In [456]: s1
Out[456]: '\x00\x01\x02\x03\x04\x05'

This is probably obvious to anyone who say the tostring() function first, but somehow my search did not dig up any good documentation on how to format an entire numpy array for wave file writing other than to use scipy.io.wavfile. So here it is. Just for completion (note that "features" is originally n_channels x n_samples, which is why I had this data order issue to begin with:

outfile = wave.open(output_file, mode='w')
outfile.setnchannels(features.shape[0])
outfile.setframerate(fs)
outfile.setsampwidth(2)
bytes = (features*(2**15-1)).astype('i2').transpose().tostring()
outfile.writeframes(bytes)
outfile.close()
Andrew Schwartz
  • 4,440
  • 3
  • 25
  • 58
  • 1
    Instead of `tostring`, you can use `struct.pack`, which allows you to specify endianness and size. See http://stackoverflow.com/a/19999599/120261 for an example of reading with `struct.unpack`, hopefully it will be a useful starting point if you want to try this other way. – mtrw Feb 17 '15 at 21:11
  • 1
    Try ``np.getbuffer(x1.ravel())``. Also, what about using ``x0.tofile()`` instead of ``x0.tostring()``? If you are concerned about Endianess read http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html – Dietrich Feb 17 '15 at 23:04
  • I forgot: When creating multidimensional arrays, you can use the parameter ``order`` to determine the way they are written to memory: http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html – Dietrich Feb 17 '15 at 23:11
  • @Dietrich ravel(), excellent! For tofile() I'm unsure if that will bypass some of the wave module features for keeping track of info to put into the headers, but could certainly try it out (later). – Andrew Schwartz Feb 18 '15 at 18:44

1 Answers1

1

For me tostring works fine. Note that in WAVE an 8-bit file must be signed, whereas others (16- or 32-bit) must be unsigned.

Some dirty demo code that works for me:

import wave
import numpy as np

SAMPLERATE=44100
BITWIDTH=8
CHANNELS=2

def gensine(freq, dur):
    t = np.linspace(0, dur, round(dur*SAMPLERATE))
    x = np.sin(2.0*np.pi*freq*t)
    if BITWIDTH==8:
        x = x+abs(min(x))
        x = np.array( np.round( (x/max(x)) * 255) , dtype=np.dtype('<u1'))
    else:
        x = np.array(np.round(x * ((2**(BITWIDTH-1))-1)), dtype=np.dtype('<i%d' % (BITWIDTH/8)))

    return np.repeat(x,CHANNELS).reshape((len(x),CHANNELS))

output_file="test.wav"

outfile = wave.open(output_file, mode='wb')
outfile.setparams((CHANNELS, BITWIDTH/8, SAMPLERATE, 0, 'NONE', 'not compressed'))
outfile.writeframes(gensine(440, 1).tostring())
outfile.writeframes(gensine(880, 1).tostring())
outfile.close()
Frank Zalkow
  • 3,850
  • 1
  • 22
  • 23