Downsampling wav audio file

Question

I have to downsample a wav file from 44100Hz to 16000Hz without using any external Python libraries, so preferably wave and/or audioop. I tried just changing the wav files framerate to 16000 by using setframerate function but that just slows down the entire recording. How can I just downsample the audio file to 16kHz and maintain the same length of the audio?

If you go down to 11025Hz it will be easier, just low pass filter and then take every 4th sample — samgak, Jun 03 '15 at 12:22
Is audioop's ratecv what you're after? https://docs.python.org/2/library/audioop.html#audioop.ratecv — Jim Jeffries, Jun 03 '15 at 12:23
It needs to be 16kHz because our pipeline tool needs to export it for Unity projects. Would you mind giving me an example of using the audioop.ratecv function? Because I'm confused with the fragment parameter of that function. How do I get it? @JimJeffries — d3cr1pt0r, Jun 03 '15 at 12:33

score 57 · Answer 1 · answered Mar 18 '18 at 12:31

57

You can use Librosa's load() function,

import librosa    
y, s = librosa.load('test.wav', sr=8000) # Downsample 44.1kHz to 8kHz

The extra effort to install Librosa is probably worth the peace of mind.

Pro-tip: when installing Librosa on Anaconda, you need to install ffmpeg as well, so

pip install librosa
conda install -c conda-forge ffmpeg

This saves you the NoBackendError() error.

answered Mar 18 '18 at 12:31

wafflecat

1,144
10
15

14

Probably the best comment here, and seems most up to date as well. Just lacking the _save_ that OP requested, which is as simple as `librosa.output.write_wav(filename, y, sr)`. – hyit Jun 29 '18 at 12:11
5

Librosa has removed write_wav since version 0.8 . It is recommended to use soundfile.write now. – Austin Sep 25 '20 at 03:40
@Austin - As recommended by Austin, write_wav is removed, however if someone still want to use older librosa versions , refer this [answer](https://stackoverflow.com/a/66999542/6503329) – Shreyesh Desai Apr 08 '21 at 07:56

score 26 · Answer 2 · edited Apr 01 '20 at 14:45

To downsample (also called decimate) your signal (it means to reduce the sampling rate), or upsample (increase the sampling rate) you need to interpolate between your data.

The idea is that you need to somehow draw a curve between your points, and then take values from this curve at the new sampling rate. This is because you want to know the value of the sound wave at some time that wasn't sampled, so you have to guess this value by one way or an other. The only case where subsampling would be easy is when you divide the sampling rate by an integer $k$. In this case, you just have to take buckets of $k$ samples and keep only the first one. But this won't answer your question. See the picture below where you have a curve sampled at two different scales.

You could do it by hand if you understand the principle, but I strongly recommend you to use a library. The reason is that interpolating the right way isn't easy or either obvious.

You could use a linear interpolation (connect points with a line) or a binomial interpolation (connect three points with a piece of polynom) or (sometimes the best for sound) use a Fourier transform and interpolate in the space of frequency. Since fourier transform isn't something you want to re-write by hand, if you want a good subsampling/supsampling, See the following picture for two curves of upsampling using a different algorithm from scipy. The "resampling" function use fourier transform.

I was indeed in the case I was loading a 44100Hz wave file and required a 48000Hz sampled data, so I wrote the few following lines to load my data:

    # Imports
    from scipy.io import wavfile
    import scipy.signal as sps

    # Your new sampling rate
    new_rate = 48000

    # Read file
    sampling_rate, data = wavfile.read(path)

    # Resample data
    number_of_samples = round(len(data) * float(new_rate) / sampling_rate)
    data = sps.resample(data, number_of_samples)

Notice you can also use the method decimate in the case you are only doing downsampling and want something faster than fourier.

Any comments this opinion? "scipy.signal.resample sucks for audio resampling. That becomes apparent quite quickly - it works in frequency domain, by basically truncation or zero-padding the signal in the frequency domain. This is quite ugly in time domain (especially since it assumes the signal to be circular)." source: http://signalsprocessed.blogspot.com/2016/08/audio-resampling-in-python.html — Matthew Walker, Jul 18 '20 at 02:20
@MatthewWalker You can use `scipy.signal.resample_poly` to use polynomial in time domain. `resample` act in frequency domain and you can explicitly control the `window` used by the Fourier transform. For resample_poly you can control padding with `padtype` and `cval`. I think that only if you do see artifact in the resampling then you need to ajuste the parameters to your needs. This will definitively depend on the type of signal you are working with. — Jeremy Cochoy, Jul 19 '20 at 09:13
@MatthewWalker From the Scipy documentation: `The argument window controls a Fourier-domain window that tapers the Fourier spectrum before zero-padding to alleviate ringing in the resampled values for sampled signals you didn’t intend to be interpreted as band-limited.` — Jeremy Cochoy, Jul 19 '20 at 09:18

score 13 · Answer 3 · answered Jun 05 '15 at 07:29

Thank you all for your answers. I found a solution already and it works very nice. Here is the whole function.

def downsampleWav(src, dst, inrate=44100, outrate=16000, inchannels=2, outchannels=1):
    if not os.path.exists(src):
        print 'Source not found!'
        return False

    if not os.path.exists(os.path.dirname(dst)):
        os.makedirs(os.path.dirname(dst))

    try:
        s_read = wave.open(src, 'r')
        s_write = wave.open(dst, 'w')
    except:
        print 'Failed to open files!'
        return False

    n_frames = s_read.getnframes()
    data = s_read.readframes(n_frames)

    try:
        converted = audioop.ratecv(data, 2, inchannels, inrate, outrate, None)
        if outchannels == 1:
            converted = audioop.tomono(converted[0], 2, 1, 0)
    except:
        print 'Failed to downsample wav'
        return False

    try:
        s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
        s_write.writeframes(converted)
    except:
        print 'Failed to write wav'
        return False

    try:
        s_read.close()
        s_write.close()
    except:
        print 'Failed to close wav files'
        return False

    return True

I know this is old but I just had the same problem so I tried the code and I think it has a subtle bug. If my inchannels=1 and outchannels=1 the tomono function will be called anyway which messes up my audio signal (the length gets cut in half). Also when writing the frames, shouldn't you only write converted[0] (depending if tomono was called obviously) because the newstate returned by ratecv is irrelevant? — user667804, Feb 10 '16 at 15:41

score 9 · Answer 4 · answered Feb 24 '20 at 07:29

I tried using Librosa but for some reasons even after giving the line y, s = librosa.load('test.wav', sr=16000) and librosa.output.write_wav(filename, y, sr), the sound files are not getting saved with the given sample rate(16000, downsampled from 44kHz). But pydub works well. An awesome library by jiaaro, I used the following commands:

from pydub import AudioSegment as am
sound = am.from_file(filepath, format='wav', frame_rate=22050)
sound = sound.set_frame_rate(16000)
sound.export(filepath, format='wav')

The above code states that the file that I reading with a frame_rate of 22050 is changed to rate of 16000 and export function overwrites the existing files with this file with a new frame_rate. It works better than librosa but I am looking ways to compare the speed between two packages but haven't yet figured it out since I have very less data !!!

Refernce: https://github.com/jiaaro/pydub/issues/232

Librosa has removed write_wav since version 0.8 . It is recommended to use soundfile.write now. — Austin, Sep 25 '20 at 03:40
In my case, it won't work if you run `sound.set_frame_rate(16000)` instead of assigning `sound = sound.set_frame_rate(16000)` since `set_frame_rate(..)` spawns a new object. — Nikolas, May 15 '23 at 08:40

score 6 · Answer 5 · answered Jun 03 '15 at 19:09

You can use resample in scipy. It's a bit of a headache to do, because there's some type conversion to be done between the bytestring native to python and the arrays needed in scipy. There's another headache, because in the wave module in Python, there is no way to tell if the data is signed or not (only if it's 8 or 16 bits). It might (should) work for both, but I haven't tested it.

Here's a small program which converts (unsigned) 8 and 16 bits mono from 44.1 to 16. If you have stereo, or use other formats, it shouldn't be that difficult to adapt. Edit the input/output names at the start of the code. Never got around to use the command line arguments.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#  downsample.py
#  
#  Copyright 2015 John Coppens <john@jcoppens.com>
#  
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#  
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#  
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#  MA 02110-1301, USA.
#  
#

inwave = "sine_44k.wav"
outwave = "sine_16k.wav"

import wave
import numpy as np
import scipy.signal as sps

class DownSample():
    def __init__(self):
        self.in_rate = 44100.0
        self.out_rate = 16000.0

    def open_file(self, fname):
        try:
            self.in_wav = wave.open(fname)
        except:
            print("Cannot open wav file (%s)" % fname)
            return False

        if self.in_wav.getframerate() != self.in_rate:
            print("Frame rate is not %d (it's %d)" % \
                  (self.in_rate, self.in_wav.getframerate()))
            return False

        self.in_nframes = self.in_wav.getnframes()
        print("Frames: %d" % self.in_wav.getnframes())

        if self.in_wav.getsampwidth() == 1:
            self.nptype = np.uint8
        elif self.in_wav.getsampwidth() == 2:
            self.nptype = np.uint16

        return True

    def resample(self, fname):
        self.out_wav = wave.open(fname, "w")
        self.out_wav.setframerate(self.out_rate)
        self.out_wav.setnchannels(self.in_wav.getnchannels())
        self.out_wav.setsampwidth (self.in_wav.getsampwidth())
        self.out_wav.setnframes(1)

        print("Nr output channels: %d" % self.out_wav.getnchannels())

        audio = self.in_wav.readframes(self.in_nframes)
        nroutsamples = round(len(audio) * self.out_rate/self.in_rate)
        print("Nr output samples: %d" %  nroutsamples)

        audio_out = sps.resample(np.fromstring(audio, self.nptype), nroutsamples)
        audio_out = audio_out.astype(self.nptype)

        self.out_wav.writeframes(audio_out.copy(order='C'))

        self.out_wav.close()

def main():
    ds = DownSample()
    if not ds.open_file(inwave): return 1
    ds.resample(outwave)
    return 0

if __name__ == '__main__':
    main()

ARHAM RUMI · Answer 6 · 2022-09-30T06:54:35.257

You can do this with ffmpeg tool on Windows, macOS or Linux. Download ffmpeg from this official link (https://ffmpeg.org/download.html). I downloaded gyan.dev build. For Windows, follow the given steps:

Extract the downloaded file
Rename the folder to ffmpeg
Cut this folder and paste it inside the OS drive. Usually, it is C drive
Move to the bin folder where ffmpeg.exe resides
Click on the address bar and Copy the Path, for me, it's C:\ffmpeg\bin
Open environment variables by typing env in the start menu
Under the Advanced tab click on the Environment Variables button
Under User variables select Path and click on Edit
Click on the New button and paste that copied path in the field
Click OK for every window
Now open CMD and type ffmpeg -version to confirm if you have added the path to environment variable correctly. If yes, you will see information about ffmpeg otherwise an error.

Now, we are ready to resample our audios. Now in your python file add the following code.

import os

source_file = "path/to/input/file/with/extension"    # "source_file.wav"
output_file = "path/to/output/file/with/extension"   # "compressed_output_file.wav"

output_str = f"ffmpeg -i {source_file} -ac 1 -ar 16000 {output_file}"
os.system(output_str)
print(output_str)

In many of my projects, I have used this code for both up-sampling and down-sampling for wav and mp3 files.

Note: Up-Sampling will increase your file size while Down-Sampling will reduce file size.

Note that `ffmpeg` will output 16 bit audio samples if not told otherwise by using the option `-sample_fmts`. I can't understand this design choice by `ffmpeg`... — Itamar Katz, Dec 17 '22 at 11:43

score 1 · Answer 7 · answered Oct 08 '22 at 16:47

if you use tensorflow library it is exemple corvert 44100 sterio .mp3 -> 16000 mono .wav

!pip install tensorflow-io==0.25.0   # что сломалось с ==0.26.0 
import tensorflow_io as tfio
import tensorflow as tf
import numpy as np



srcFilePath = '/content/data/dataset_phoneme_in/she/pronunciation_en_she.mp3'
dstFilePath =  '/content/temp/1.wav'

#wavFensor=getAudioTensorFromFilePath(src)


rateOut=16000

audioIOTensor = tfio.audio.AudioIOTensor(srcFilePath)  #читает разный формат  работает на версии  !pip install tensorflow-io==0.25.0
print(audioIOTensor.shape)
chanalsIn=(int)(audioIOTensor.shape[1])
rateIn=(int)(audioIOTensor.rate)
print(audioIOTensor.shape[1])
audioTensor = audioIOTensor[0:] #get audio block   получить звуковый блок

if (chanalsIn>1): #sterio to mono
  audioTensor=audioTensor.numpy()
  audioTensor=np.average(audioTensor,axis=1)
  audio_slice=tf.convert_to_tensor(audioTensor)

print(audioTensor.shape)

#change rate
audioTensor=tfio.audio.resample(audioTensor, rateIn,rateOut)

print(audioTensor.shape)


# remove last dimension
#audioTensor = tf.squeeze(audioTensor, axis=[1])
# convert to wav and save 
#wav = tf.cast(audioTensor, tf.float32) / 32768.0
print(audioTensor.shape)
audioTensor=tf.expand_dims(audioTensor, axis=1)  # add axis for tf.audio.encode_wav
print(audioTensor.shape)
outWavAudio=tf.audio.encode_wav(audio=audioTensor,sample_rate=rateOut)
    
tf.io.write_file(dst,outWavAudio)

Gaurao Mate · Answer 8 · 2020-12-16T10:27:42.777

0

First, you need to import 'librosa' library Use 'librosa.load' to resample the audio file librosa.load(path,sr) initiallly sr(sampling rate) = 22050.If you want to preserve native sampling rate make sr=None. otherwise the audio will be resampled to the sampling rate provided

edited Dec 16 '20 at 10:27

answered Dec 16 '20 at 10:19

Gaurao Mate

25
2

Downsampling wav audio file

8 Answers8

Linked