4

I'm currently trying to calculate the frequency response of the iphone's speaker/microphone roundtrip. I play a sine sweep on the speaker, record it via the microphone and try to get the frequency response out of that. final goal is to be able to multiply the FR to any given sound to make it sound like the iphones speaker/mic.

My Code so far:

//apply window function
vDSP_vmul(sineSweepMic,1,hammingWindow,1,sineSweepMic,1,n);
vDSP_vmul(sineSweepFile,1,hammingWindow,1,sineSweepFile,1,n);

//put both signals in complex arrays
vDSP_ctoz((DSPComplex *)sineSweepMic, 2, &fftSineSweepMic, 1, nOver2);
vDSP_ctoz((DSPComplex *)sineSweepFile, 2, &fftSineSweepFile, 1, nOver2);

//fft of both file and mic sweeps
vDSP_fft_zrip(fftSetup, &fftSineSweepFile, 1, log2n, FFT_FORWARD);
vDSP_fft_zrip(fftSetup, &fftSineSweepMic, 1, log2n, FFT_FORWARD);

//back to interleaved
vDSP_ztoc(&fftSineSweepFile, 1, (COMPLEX *)sineSweepFile, 2, nOver2);
vDSP_ztoc(&fftSineSweepMic, 1, (COMPLEX *)sineSweepMic, 2, nOver2);

//divide mic-sweep by file-sweep to create frequency response
vDSP_vdiv(sineSweepFile, 1, sineSweepMic, 1, frequencyResponse, 1, n);

this works so far and when i multiply the FR with the initial file-sweep it sounds like the mic-sweep.

My Problem: this only works for the exact file (sweep) the FR is generated from. As soon as i use the FR to modify other sounds, music for example only noise comes out.

i use the FR like this (both in frequency domain, interleaved, not complex, even same length):

    vDSP_vmul(soundToModify, 1, frequencyResponse, 1, soundToModify, 1, n);

My sine-sweep from file played on speaker: enter image description here

My recorded sine-sweep (attenuated low frequencies visible): enter image description here

My file sine-sweep multiplied in frequency domain with the FR generated as above in code: enter image description here

My Goal: in my understanding the frequency response is the information about each frequency, how much it is attenuated or amplified by the system (in my example it is not able to reproduce low frequencies). To get this kind of information i generate a sound containing every desired frequency (sine-sweep) play it and analyze how every frequency is modified by dividing recorded-sweep/file-sweep (division in code).

By multiplying this FR in frequency domain to any sound should modify the frequency amplitudes to mimic a playback on my system, right?

thanks!


UPDATE: in the end the fault was the missing complex arithmetic and both, the sine-sweep as well as the pink noise worked pretty well as a impulse to recover th impulse response.

to get working code just complex-divide the recorded sweep fft data by the initial sweep fft data.

Maximilian Körner
  • 846
  • 11
  • 31
  • If you just want to measure the frequency response then generate white (gaussian) noise for several seconds, collect a number of FFTs and average the resulting power spectra. Also note that trying to "correct" the frequency response like this is inherently unstable numerically. Google for "deconvolution". – Paul R Jan 09 '13 at 07:47
  • i added a little more information above. isn`t my sine-sweep suitable for this, as it contains every desired frequency or will the change of my analyzation signal to pink noise fix my issues? is my approach to get the FR by dividing recorded/original after FFT right? – Maximilian Körner Jan 10 '13 at 01:20
  • Using a (slow) sine sweep is OK if you do frequency response in the time domain - use noise for frequency domain measurements. But the bigger problem is that you're trying to do deconvolution in an inherently flawed way. – Paul R Jan 10 '13 at 07:50

1 Answers1

3

If you want to recreate the sound of the iPhone speaker/mic, ideally you need to find the impulse response of the system.

What you are doing wrong: finding the FFT of a sine sweep is meaningless since the input frequency is something that changes (linearly or exponentially or other) to begin with, before the system imposes its own frequency response on top of that. As Paul R suggested above, finding FFTs of white noise makes more sense since averaging over many statistically-flat input frequencies will give you the actual frequency response of the system.

However, if your goal is to recreate the sound of the system, you also need to take care of phase, which is not being done in either of the above methods. The 'ideal' way to do it would be to capture the response of the iPhone speaker/mic system to an 'impulse' in a perfectly quiet and dry (no reflections) environment. There are 3 ways to do so: 1. Use a balloon pop sound, or a synthetically created impulse sound to do so. 2. Use Golay Codes, which is a simpler way of averaging many impulse response measurements 3. Use sine sweeps but then use correlation to find the impulse response.

Reference: https://ccrma.stanford.edu/realsimple/imp_meas/imp_meas.pdf

Once you obtain the impulse response measurement, either convolve this with the signal you are trying to 'color', or take the FFT of both signals, multiply in the frequency domain, and then take inverse FFT to get the colored signal.

Explanation: I'll try and explain it to the best of my knowledge: - When you take the FR of an impulse response, you take the magnitude of its FFT, throwing away the phase data. Therefore, there are many filters(systems) with the same magnitude FR that will give you radically different outputs. Case in point will be Allpass Filters - they all have a flat FR but if you put an impulse through them, you can get back a sine sweep, depending on the filter parameters. Clearly, this should point to the fact that though you can always go FROM an IR to a FR, going back in the opposite direction means you are making an arbitrary choice. Hence, you cannot throw away the phase, even for rough estimates. The fact that we cannot hear phase means that we can look at the FR for information about the system but does not allow us to disregard phase in modeling the system. I hope that makes sense? To use a sine sweep, do the following - if s(t) = sin(A(t)) and A(t) = integral[0 to t] (w(t)dt), correlate the signal e(t) = corr(v(t),sin(A(t)) where v(t) = 2 * abs(dw/dt) will produce an impulse. Therefore, if you replace the sine sweep in that correlation with the measured signal, you should obtain its impulse response. Hope that helps! Sorry for it being so math-y.

e7mac
  • 553
  • 7
  • 19
  • in my understanding the impulse response is the inverse fourier transformed frequency response. (as mentioned im new to all of this, so please feel to prove me wrong! :D) is the change from sine-sweep to pink/white noise significant, as the fft data is time invariant (?) and only the overall power of each frequency is relevant? and as i correlate the signals first to align them the best possible they should contain the same data. as i don`t need perfect data but a rough estimation, could i leave phase out and just focus on frequency amlitude modification via FR? thanks for the reference! – Maximilian Körner Jan 10 '13 at 01:31
  • ok i think i now have an idea how IR and FR are related. back to my approach: as i understand all i did IS creating an Impulse Response with the sine sweep method (explained [here](http://www.dspguide.com/ch9/2.htm): 1. played sine sweep and recorded it 2. deconvolute both to negotiate the actual sweep data and leave the response. (with vdsp_vdiv() as deconvolution is division in frequency domain) 3. apply the IR via convolution (or in my case the FR as multiplication) may it be, that my fault is, that COMPLEX division/multiplication cannot be done with these functions used for REAL numbers? – Maximilian Körner Jan 10 '13 at 03:22
  • yeah the arithmetic in the frequency domain needs to be complex for sure - that is the issue - by taking magnitude of the complex numbers you are throwing away required data - either try doing the complex arithmetic and post what you got, or try the arithmetic that I posted in the explanation which keeps it real. do write about the results and what finally works! – e7mac Jan 12 '13 at 00:55