Since music speed and pitch is coupled together, if I speed up music, pitch is also increased. And conversely, if I slow down music, pitch is also decreased.
However, I saw that using granular synthesis, I can decouple speed and pitch. So, I'm currently trying hard to implement granular synthesis.
First of all, I think I succeeded in implementing double speed and half speed, while pitch is same. The code is as same as the following:
※ grain size is 2000. It means that I use 0.04ms of sound as one grain. (2000 samples * 1 s / 44100 samples = 0.04s = 40ms)
// get music
const $fileInput = document.createElement('input');
$fileInput.setAttribute('type', 'file');
document.body.appendChild($fileInput);
$fileInput.addEventListener('change', async (e) => {
const music = await $fileInput.files[0].arrayBuffer();
const actx = new (window.AudioContext || window.webkitAudioContext)({ latencyHint: 'playback', sampleRate: 44100 });
const audioData = await actx.decodeAudioData(music);
const original = audioData.getChannelData(0);
const arr = [];
const grainSize = 2000;
// Please choose one code out of double speed code or half speed code
// copy and paste audio processing code here
});
// double speed
// ex: [0,1,2,3, 4,5,6,7, 8] => [0,1, 4,5, 8] discard 2 items out of 4 items
for (let i = 0; i < original.length; i += grainSize) {
if (original[i + (grainSize / 2) - 1] !== undefined) {
for (let j = 0; j < grainSize / 2; j++) {
arr.push(original[i + j]);
}
} else {
for (let j = i; j < original.length; j++) {
arr.push(j);
}
}
}
// half speed
// ex: [0,1, 2,3, 4] => [0,1,0,0, 2,3,0,0, 4,0,0] add 'two' zeros after every 'two' items
for (let i = 0; i < original.length; i += grainSize) {
if (original[i + grainSize - 1] !== undefined) {
for (let j = 0; j < grainSize; j++) {
arr.push(original[i + j]);
}
} else {
for (let j = i; j < original.length; j++) {
arr.push(original[j]);
}
}
for (let j = 0; j < grainSize; j++) {
arr.push(0);
}
}
// play sound
const f32Arr = Float32Array.from(arr);
const audioBuffer = new AudioBuffer({ length: arr.length, numberOfChannels: 1, sampleRate: actx.sampleRate });
audioBuffer.copyToChannel(f32Arr, 0);
const absn = new AudioBufferSourceNode(actx, { buffer: audioBuffer });
absn.connect(actx.destination);
absn.start();
But the problem is, I totally have no idea how to implement pitch shifter (that is, different pitch, same speed).
As far as I think, same speed means same AudioBuffer size. Therefore, the only variable in my hand is grain size. But I seriously don't know what should I do. It would be greatly appreciated if you share some of your knowledge. Thank you very much!
To Phil Freihofner
Hello, thank you for the kind explanation. I tried your method. As far as I understand, your method is a process that does the following:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] // input data (10 samples)
→ [0, 2, 4, 6, 8] // double speed, 1 octave high (sampling interval: 2)
→ [0, 0, 2, 2, 4, 4, 6, 6, 8, 8] // change duration
The result sounds 1 octave high with same duration (successful pitch shifting). However, I don't know what should I do if I do sampling from the input data with sampling interval 1.5? What I mean is I have no idea how to make the length of [0, 1, 3, 4, 6, 7, 9]
as the same length with the input data.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] // input data
// from
[0, 1, 3, 4, 6, 7, 9] // sampling interval: 1.5
// to
[?, ?, ?, ?, ?, ?, ?, ?, ?, ?]
Meanwhile, I learned that pitch shift can be achieved by a way and as far as I understand, the way is as following:
- [1] make each granule be started at the same position as original source
- [2] play each granule with different speed.
In addition, I found that I can achieve pitch shifting and time stretching if I transform an input data like the following:
input data = [0, 1, 2, 3, 4, 5, 6, 7]
grain size = 4
<pitch shifting>
in case of p = 2
result = [0, 2, 0, 2, 4, 6, 4, 6] // sounds like one octave high
// If I remember correctly, [0, 2, 0, 0, 4, 6, 0, 0] is also fine
// (and it is more fit to the definition above ([1] and [2])
// but the sound was not good (stuttering).
// I found that [0, 2, 0, 2...] is better.
in case of p = 1.5
result = [0, 1, 3, 0, 4, 5, 7, 4]
in case of p = 0.5
result = [0, 0, 1, 1, 4, 4, 5, 5] // sounds like one octave low
<time stretching>
in case of speed = 2
result = [0, 1, 4, 5]
in case of speed = 1.2
result = [0, 1, 2, 4, 5, 6]
in case of speed = 0.5
result = [0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7]
// If I remember correctly, [0, 1, 2, 3, 0, 0, 0, 0...] is also fine
// but the sound was not good (stuttering)
in case of speed = 0.75
result = [0, 1, 2, 3, 0, 4, 5, 6, 7, 4]
Anyway, thank you for the answer.