0

I want to augment my audio data for a machine learning project. I am looking for a way to gradually modulate pitch of an audio clip to simulate the Doppler effect. From what I can see Librosa and Torchaudio support only basic pitch shift function and I cannot come up with any idea how to deal with this problem, besides doing it manually in GarageBand or some other DAW :) Thank you!

stanislax
  • 3
  • 2

2 Answers2

0

Pedalboard is a library for audio data augmentation and preprocessing that allows to use any VST3 plugin. Then you could use a VST plugin like Doppler by Waves, or similar.

Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
0

Are any of the libraries that are listed in the post How to edit raw PCM audio data without an audio library? available and working?

If you have the raw source PCM and have a means of playing back PCM, a transform that changes pitch can be calculated.

The simplest case would be where we drop every second frame from the PCM being outputted. In this case, the segment of signal being played back takes 1/2 the amount of time. This would result the pitch of the output being 2Xs the original.

For an intermediate (between playback at speed and at 2xs speed) rate of playback, we can calculate PCM values using linear interpolation. Linear interpolation has been generally accepted as "accurate enough" for such transforms.

Let's say we want the resulting pitch to be at 110% of the original (e.g., a tone at 440 Hz plays back at 484 Hz). To do this we create an index or cursor that will increment by the value of 1.1. Given a series of PCM data points pcmIn[0]...pcmIn[n], with the first value being pcmIn[0], the second will be 1/10 of the way between pcmIn1 and pcmIn[2], and can be calculated as follows:

pcmOut[1] = pcmIn[1] * (0.9) + pcmIn[2] * (0.1)

and the next will be as follows:

pcmOut[2] = pcmIn[2] * (0.8) + pcmIn[3] * (0.2)

I leave it to the OP to implement this in a more useful/general form in Python. I've only done this in Java (where idx is the incrementing float cursor):

    int intIndex = (int) idx;
    pcmOut = pcmIn[intIndex + 1] * (idx - intIndex) 
            + pcmIn[intIndex] * ((intIndex + 1) - idx);

In any event, a Doppler effect would be achieved by having the increment of the idx variable be slightly "sharp" in musical parlance (larger than 1) up until the point where the moving object passes, and slightly flat (smaller than 1) once the object starts moving away. The amount would be calculated based on the angle of approach and the speed of the object relative to the speed of sound in air.

Care has to be taken to change the increment gradually. A sudden large change could introduce a signal discontinuity that will sound as a click transient. I don't have a rule of thumb on what size of increment change is tolerable. Just something to listen for, is how I've treated the situation. For example, going directly from a playback speed of 1x to 2x might cause a click, but if you transition from 1 to 2 with the increment over 441 PCM values (the total duration of the transition being over the course of 1/100th of a second at 44100 sample rate)--well I've never heard a transient being generated with this degree of caution being taken. I more usually use something like 64 or 128 samples to go from speed A to speed B, and even fewer can work.

Phil Freihofner
  • 7,645
  • 1
  • 20
  • 41