Apologizing if the answer is on the site somewhere (couldn't find). I'm a hobbyist who tries to load WAV file, get it's magnitude and phase data (for modification), generate spectrogram and then save it back as a new WAV file. I use C++ (Qt) and FFTW library.
My problem is that resulting WAV differs from original even when no modifications are made. If FFT operations are performed on whole sample sequence, then it looks just like the original. But I have to use STFTs with overlapping windows. In this case I get distortions resulting in periodic cracking/throttling sounds, also waveform of audio is significantly changed.
This can be seen in following examples (viewed in Audacity):
original / processed in one chunk: original
processed (windowSize=2048, hopSize=1024, no window function): processed ws=2048, hs=1024, wf=none
I can't post more examples with my reputation, but performing Hamming window function after ISTFT (not before STFT) with the method I use to combine resulting windowed samples gives good sound. But waveform is still quite different, mainly significant loss in peaks is observed.
I think the way I combine result of ISTFT into new sample sequence is the problem. What is the proper way to do this? Example in C++ would be really appreciated.
- EDIT * As correctly pointed out by SleuthEye I made a mistake in code. Code is adjusted. Waveform and sound seems to be perfect now even without applying a window function. Still, is the method correct for such an operation?
Here's the relevant source:
// getSampleNormalized(n) returns sample n of 1 channel in -1.0 to 1.0 range
// getSampleCount() returns sample count of 1 channel
// quint32 is just unsigned int
quint32 windowSize = 2048;
quint32 windowSizeHalf = windowSize / 2 + 1;
quint32 slideWindowBy = 1024; // hopSize
quint32 windowCount = getSampleCount() / slideWindowBy;
if ( (windowCount * slideWindowBy) < getSampleCount()){
windowCount += 1;
}
quint32 newSampleCount = windowCount * slideWindowBy + ( windowSize - slideWindowBy );
double *window = new double[windowSize];
fftw_complex *fftResult = new fftw_complex[windowSizeHalf];
fftw_complex *fftWindow = new fftw_complex[windowSizeHalf];
double *result = new double[windowSize];
double **magnitudes = new double*[windowCount];
double **phases = new double*[windowCount];
double **signalWindows = new double*[windowCount];
for (int i = 0; i < windowCount; ++i){
magnitudes[i] = new double[windowSizeHalf];
phases[i] = new double[windowSizeHalf];
signalWindows[i] = new double[windowSize];
}
double *sampleSignals = new double[newSampleCount];
fftw_plan fftPlan = fftw_plan_dft_r2c_1d( windowSize, window, fftResult, FFTW_ESTIMATE );
fftw_plan ifftPlan = fftw_plan_dft_c2r_1d( windowSize, fftWindow, result, FFTW_ESTIMATE );
// STFT
for ( int currentWindow = 0; currentWindow < windowCount; ++currentWindow ){
for (int i = 0; i < windowSize; ++i){
quint32 currentSample = currentWindow * slideWindowBy + i;
if ( ( currentSample ) < getSampleCount() ){
window[i] = getSampleNormalized( currentSample ); // * ( windowHamming( i, windowSize ) );
}
else{
window[i] = 0.0;
}
}
fftw_execute(fftPlan);
for (int i = 0; i < windowSizeHalf; ++i){
magnitudes[currentWindow][i] = sqrt( fftResult[i][0]*fftResult[i][0] + fftResult[i][1]*fftResult[i][1] );
phases[currentWindow][i] = atan2( fftResult[i][1], fftResult[i][0] );
}
}
// INVERSE STFT
for ( int currentWindow = 0; currentWindow < windowCount; ++currentWindow ){
for ( int i = 0; i < windowSizeHalf; ++i ){
fftWindow[i][0] = magnitudes[currentWindow][i] * cos( phases[currentWindow][i] ); // Real
fftWindow[i][1] = magnitudes[currentWindow][i] * sin( phases[currentWindow][i] ); // Imaginary
}
fftw_execute(ifftPlan);
for ( int i = 0; i < windowSize; ++i ){
signalWindows[currentWindow][i] = result[i] / windowSize; // getting normalized result
//signalWindows[currentWindow][i] *= (windowHamming( i, windowSize )); // applying Hamming window function
}
}
quint32 pos;
// HERE WE COMBINE RESULTED WINDOWS
// COMBINE AND AVERAGE
// 1st window should be full replace
for ( int i = 0; i < windowSize; ++i ){
sampleSignals[i] = signalWindows[0][i];
}
// 2nd window and onwards: combine with previous ones
for ( int currentWindow = 1; currentWindow < windowCount; ++currentWindow ){
// combine and average with data from previous window
for ( int i = 0; i < (windowSize - slideWindowBy); ++i ){
pos = currentWindow * slideWindowBy + i;
sampleSignals[pos] = (sampleSignals[pos] + signalWindows[currentWindow][i]) * 0.5;
}
// simply replace for the rest
for ( int i = (windowSize - slideWindowBy); i < windowSize; ++i ){
pos = currentWindow * slideWindowBy + i;
sampleSignals[pos] = signalWindows[currentWindow][i];
}
}
// then just save the wav file...