1

I'm building a simple Python application that involves altering the speed of an audio track. (I acknowledge that changing the framerate of an audio also make pitch appear different, and I do not care about pitch of the audio being altered). I have tried using solution from abhi krishnan using pydub, which looks like this.

from pydub import AudioSegment
sound = AudioSegment.from_file(…)

def speed_change(sound, speed=1.0):
    # Manually override the frame_rate. This tells the computer how many
    # samples to play per second
    sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
         "frame_rate": int(sound.frame_rate * speed)
      })
     # convert the sound with altered frame rate to a standard frame rate
     # so that regular playback programs will work right. They often only
     # know how to play audio at standard frame rate (like 44.1k)
    return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)

However, the audio with changed speed sounds distorted, or crackled, which would not be heard with using Audacity to do the same, and I hope I find out a way to reproduce in Python how Audacity (or other digital audio editors) changes the speed of audio tracks.

I presume that the quality loss is caused by the original audio having low framerate, which is 8kHz, and that .set_frame_rate(sound.frame_rate) tries to sample points of the audio with altered speed in the original, low framerate. Simple attempts of setting the framerate of the original audio or the one with altered framerate, and the one that were to be exported didn't work out.

Is there a way in Pydub or in other Python modules that perform the task in the same way Audacity does?

2 Answers2

0

Assuming what you want to do is to play audio back at say x1.5 the speed of the original. This is synonymous to saying to resample the audio samples down by 2/3rds and pretend that the sampling rate hasn't changed. Assuming this is what you are after, I suspect most DSP packages would support it (search audio resampling as the keyphrase).

You can try scipy.signal.resample_poly()


from scipy.signal import resample_poly

dec_data = resample_poly(sound.raw_data,up=2,down=3)

dec_data should have 2/3rds of the number of samples as the original raw_data samples. If you play dec_data samples at the sound's sampling rate, you should get a sped-up version. The downside of using resample_poly is you need a rational factor, and having large numerator or denominator will cause output less ideal. You can try scipy's resample function or seek other packages, which supports audio resampling.

kesh
  • 4,515
  • 2
  • 12
  • 20
0

As people have written in the post you mentioned, Pydub has a method called speedup() for the purpose, but I found it works well for increasing the audio speed, and your method actually works better for decreasing the speed. So I've combined both in my implementation of aiTransformer Speech Synthesizer, with the following code:

from pydub.effects import speedup
if speed > 1:
    audio = speedup(audio, playback_speed=speed)
else:
    audio = speed_change(audio, speed)

I struggled a while to find a decent solution for changing audio speed in both directions, even ChatGPT just gave the speedup one, hope my answer can help others that have similar need.