20
ifile = wave.open("input.wav")

how can I write this file into a numpy float array now?

IAM
  • 895
  • 2
  • 13
  • 30

4 Answers4

42
>>> from scipy.io.wavfile import read
>>> a = read("adios.wav")
>>> numpy.array(a[1],dtype=float)
array([ 128.,  128.,  128., ...,  128.,  128.,  128.])

Typically it would be bytes which are then ints... here we just convert it to float type.

You can read about read here: https://docs.scipy.org/doc/scipy/reference/tutorial/io.html#module-scipy.io.wavfile

SuperStormer
  • 4,997
  • 5
  • 25
  • 35
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • thanks! one more question, how could I do this for all .wav-files in the current working directory? I mean saving each file in a cycle in an array, and concentrating it by the end of each step to a main array? – IAM May 27 '13 at 19:04
22

Seven years after the question was asked...

import wave
import numpy

# Read file to get buffer                                                                                               
ifile = wave.open("input.wav")
samples = ifile.getnframes()
audio = ifile.readframes(samples)

# Convert buffer to float32 using NumPy                                                                                 
audio_as_np_int16 = numpy.frombuffer(audio, dtype=numpy.int16)
audio_as_np_float32 = audio_as_np_int16.astype(numpy.float32)

# Normalise float32 array so that values are between -1.0 and +1.0                                                      
max_int16 = 2**15
audio_normalised = audio_as_np_float32 / max_int16
Matthew Walker
  • 2,527
  • 3
  • 24
  • 30
  • How should I install the `wave` module? `pip install wave`? – Unsigned_Arduino Jun 14 '20 at 03:24
  • 3
    @Unsigned_Arduino Have you just tried it? According to the docs, the wave module has been part of Python since at least version 2.7, and it's still included in version 3.8: https://docs.python.org/3.8/library/wave.html – Matthew Walker Jun 14 '20 at 05:50
  • Just tried it, it's included. I never seen this module before so I questioned it's existance in the PSL. – Unsigned_Arduino Jun 14 '20 at 13:44
  • 1
    Hi Matthew Walker, thanks for such a nice answer. I want to ask, that the size of audio_normalised is twice that of samples, so is it representing data for 2 channels, or sth else, please can you elaborate a bit? – Trees Nov 27 '20 at 05:39
  • I’m not sure I understand your question. By size do you mean the length in bytes perhaps? If that’s the case then I suspect the answer lies in the int16 (2 byte) representation of the PCM in a wave file. Does that answer your question? – Matthew Walker Nov 28 '20 at 09:00
  • How to know if we should use `int16` or `int32` in `frombuffer`? Any wave file attributes to tell? – avocado Apr 25 '21 at 15:45
  • Oh, I think `wave.getsampwidth` is the value for `int16` or `int32` in `frombuffer`, right? – avocado Apr 25 '21 at 16:07
  • 1
    @avocado [getsampwidth()](https://docs.python.org/3/library/wave.html#wave.Wave_read.getsampwidth) returns the sample width in bytes, so 2 bytes => `int16`, or 4 bytes => `int32`. I guess I just hadn't come across WAV files with anything other than 2 bytes per sample. Good point. – Matthew Walker Apr 26 '21 at 23:46
  • I wrote a more detailed answer on how to get a `numpy` array from a `wav` file [here](https://stackoverflow.com/a/71042208/13180090) taking into account the sample width and channels. Although I haven't experimented with the normalization so I won't add it as a separate answer. – Andreu Gimenez Feb 08 '22 at 23:49
9

Use librosa package and simply load wav file to numpy array with:

y, sr = librosa.load(filename)

loads and decodes the audio as a time series y, represented as a one-dimensional NumPy floating point array. The variable sr contains the sampling rate of y, that is, the number of samples per second of audio. By default, all audio is mixed to mono and resampled to 22050 Hz at load time. This behavior can be overridden by supplying additional arguments to librosa.load().

More information at Librosa library documentation

Community
  • 1
  • 1
Esterlinkof
  • 1,444
  • 3
  • 22
  • 27
0

Don't have enough reputation to comment underneath @Matthew Walker 's answer, so I make a new answer to add an observation to Matt's answer. max_int16 should be 2**15-1 not 2**15.

Better yet, I think the normalization line should be replaced with:

audio_normalised = audio_as_np_float32 / numpy.iinfo(numpy.int16).max

If the audio is stereo (i.e. two channels) the left right values are interleaved, so to get the stereo array the following can be used :

channels = ifile.getnchannels()
audio_stereo = np.empty((int(len(audio_normalised)/channels), channels))
audio_stereo[:,0] = audio_normalised[range(0,len(audio_normalised),2)]
audio_stereo[:,1] = audio_normalised[range(1,len(audio_normalised),2)]

I believe this answers @Trees question in the comments section.

  • The issue with the definition of `max_int16` is interesting. The range of 16 bit integers is -32,768 to 32,767. If we want to scale from -1 to 1 then we want to divide by the largest possible value, in an absolute sense, or 32,768, which is `2**15`. Hence the definition of `max_int16` in my answer. – Matthew Walker Apr 26 '21 at 23:56