I have some FFT data, 257 dimensions, every 10 ms, with 121 frames, i.e. 1.21 secs. I think the first dimension is probably something else and the remaining are the FFT coefficients, I guess. It's probably just spectogram data. From a comment about the FFT data, sqrt10 and mean-variance-normalization might have been applied on it.
From there, I want to calculate back some PCM signal for 44.1 Hz so I can play the sound. I asked the same question in a more mathematical way here but maybe StackOverflow is a better place because I actually want to implement this. I also asked the same question about the theory here on DSP SE.
How would I do that? Maybe I need some more information (which I have to find out somehow) - which? Maybe these missing information can be intelligently guessed somehow?
This question is both about the theory and practical implementation. The implementation is trivial I guess. But a concrete example in some language would be nice to help understanding the theory. Maybe C++ with FFTW? I skipped through the FFTW docs but I fail to understand all the terminology and some background, e.g. here. Why is it from complex to real or the other way, I only want real to real. What are those REDFT? What's a DCT, DFT, DST? FFTW_HC2R?
I read all the FFT data, i.e. 121 * 257 floats, into a vector freq_bins
.
std::vector<float32_t> freq_bins; // FFT data
int freq_bins_count = 257;
size_t len = 121;
std::vector<float32_t> pcm; // output, PCM data
int N = freq_bins_count;
std::vector<double> out(N), orig_in(N);
// inspiration: https://stackoverflow.com/questions/2459295/invertible-stft-and-istft-in-python/6891772#6891772
for(int f = 0; f < len; ++f) {
size_t pos = freq_bins_count * f;
for(int i = 0; i < N; ++i)
out[i] = pow(freq_bins[pos + i] + offset, 10); // fft was sqrt10 + mvn
fftw_plan q = fftw_plan_r2r_1d(N, &out[0], &orig_in[0], FFTW_REDFT00, FFTW_ESTIMATE);
fftw_execute(q);
fftw_destroy_plan(q);
// naive overlap-and-add
auto start_frame = size_t(f * dt * sampleRate);
for(int i = 0; i < N; ++i) {
sample_t frame = orig_in[i] * scale / (2 * (N - 1));
size_t idx = start_frame + i;
while(idx >= pcm.size())
pcm.push_back(0);
pcm[idx] += frame;
}
}
But this is wrong, I guess. I just get garbage out.
Related might be this question. Or this.