Detecting a low frequency tone in an audio file

Question

I know this question been asked hundred times... But I am getting frustrated with my result so I wanted to ask again. Before I dive deep into fft, I need to figure this simple task out.

I need to detect a 20 hz tone in an audiofile. I insert the 20hz tone myself like in the picture. (It can be any frequency as long as listener can't hear it so I thought I should choose a frequency around 20hz to 50 hz)

enter image description here

info about the audiofile.

afinfo 1.m4a 
File:           1.m4a
File type ID:   adts
Num Tracks:     1
----
Data format:     1 ch,  22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
estimated duration: 8.634043 sec
audio bytes: 42416
audio packets: 219
bit rate: 33364 bits per second
packet size upper bound: 768
maximum packet size: 319
audio data file offset: 0
optimized
format list:
[ 0] format:      1 ch,  22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
----

I followed this three tutorials and I came up with a working code that reads audio buffer and gives me fft doubles.

http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
https://github.com/alexbw/iPhoneFFT
How do I obtain the frequencies of each value in an FFT?

I read the data as follows

// If there's more packets, read them
        inCompleteAQBuffer->mAudioDataByteSize = numBytes;
        CheckError(AudioQueueEnqueueBuffer(inAQ,
                                           inCompleteAQBuffer,
                                           (sound->packetDescs?nPackets:0),
                                           sound->packetDescs),
                   "couldn't enqueue buffer");
        sound->packetPosition += nPackets;


        int numFrequencies=2048;
        int kNumFFTWindows=10;

        SInt16 *testBuffer = (SInt16*)inCompleteAQBuffer->mAudioData; //Read data from buffer...!

        OouraFFT *myFFT = [[OouraFFT alloc] initForSignalsOfLength:numFrequencies*2 andNumWindows:kNumFFTWindows];
        for(long i=0; i<myFFT.dataLength; i++)
        {
            myFFT.inputData[i] = (double)testBuffer[i];

        }
        [myFFT calculateWelchPeriodogramWithNewSignalSegment];
        for (int i=0;i<myFFT.dataLength/2;i++) {
            NSLog(@"the spectrum data %d is  %f ",i,myFFT.spectrumData[i]);
}

and my out out log something like

Everything checks out for 4096 samples of data
Set up all values, about to init window type 2
the spectrum data 0 is  42449.823771 
the spectrum data 1 is  39561.024361 
.
.
.
.
the spectrum data 2047 is  -42859933071799162597786649755206634193030992632381393031503716729604050285238471034480950745056828418192654328314899253768124076782117157451993697900895932215179138987660717342012863875797337184571512678648234639360.000000

I know I am not calculating the magnitude yet but how can I detect that sound has 20 hz in it? Do I need to learn Goertzel algorithm?

From your picture it is not clear to me if you are bursting a 20Hz sine bursted at a lower frequency or a higher frequency sine bursted at 20Hz. — jaket, Mar 06 '15 at 17:41
Do a low pass filtering first. Then use autocorrelation, it is usually better than the harmonics in FFT for noisy signals — Sten, Mar 06 '15 at 17:44
If those pulses in the bottom plot are 20Hz single cycles, then you're not going to be able to easily pick them up reliably in real-time with an FFT or Goertzel's algorithm. If you're using them as hidden markers in the file, then I would low-pass filter the data and then autocorrelate or even just watch for sections of the right width above some threshold value. (Edited: autocorrelation won't pick up those single cycles well) — Katie, Mar 06 '15 at 17:44
Also I see you're working with AAC - you may want to confirm that the added tone shows up in the final file by looking at it after compression in Audacity. Audio compression works by throwing out parts of the signal that can't be heard, so it may be removing the 20Hz tone before you get a chance to process it. — Katie, Mar 06 '15 at 17:50
@Katie yes I check the audio with several programs and I make sure that low frequency is not trimmed. My initial idea was to insert 5 cycles in 20z with a great magnitude so with a simple counter when I get positive boolean for 5 times just go ahead and apply my logic, bad idea? — Mord Fustang, Mar 06 '15 at 18:49
@MordFustang, not a bad plan as long as you don't make the magnitude too large. If you do, then you may start clipping the source audio. Running it through a low-pass filter should simplify things enough that you don't have false positives. However, I would probably compute the correlation against a signal that matches the tone you're adding and watch for that to cross some threshold rather than just examining the raw signal level. Because it is so low frequency you can correlate at a downsampled (100Hz or so) sample rate and not need much CPU time to do it. — Katie, Mar 06 '15 at 18:54
FFT is complete overkill for this - just use the [Goertzel Algorithm](http://en.m.wikipedia.org/wiki/Goertzel_algorithm). — Paul R, Mar 07 '15 at 08:26
@PaulR I guess I will go with Goertzel, I have just learned that as long as th total length of the tone is less then 500ms I can add it to beginning or end of the file. So I have found a code and tested it but float returns `inf` . I guess I will post another question for that. — Mord Fustang, Mar 09 '15 at 17:14

Scott Stensland · Answer 1 · 2015-03-06T19:02:04.087

There are many ways to convey information which gets inserted into then retrieved from some preexisting wave pattern. The information going in can vary things like the amplitude (amplitude modulation) or freq (frequency modulation), etc. Do you have a strategy here ? Note that the density of information you wish to convey can be influenced by such factors as the modulating frequency (higher frequencies naturally can convey more information as it can resolve changes more times per second).

Another approach is possible if both the sender and receiver have the source audio (reference). In this case the receiver could do a diff between reference and actual received audio to resolve out the transmitted extra information. A variation on this would be to have the sender send ~~same~~ audio twice, first send the reference untouched audio followed by a modulated version of this same reference audio that way the receiver just does a diff between these two audibly ~~same~~ clips to resolve out the embedded audio.

Going back to your original question ... if the sender and receiver have an agreement ... say for some time period X the reference pure 20 Hz tone is sent followed by another period X that 20 Hz tone is modulated by your input information to either alter its amplitude or frequency ... then just repeat this pattern ... on receiving side they just do a diff between each such pair of time periods to resolve your modulated information ... for this to work the source audio cannot have any tones below some freq say 100 Hz (you remove such frequency band if needed) just to eliminate interference from source audio ... you have not mentioned what kind of data you wish to transmit ... if its voice you first would need to stretch it out in effect lowering its frequency range from the 1 kHz range down to your low 20 Hz range ... once result of diff is available on receiving side you then squeeze this curve to restore it back to normal voice range of 1kHz ... maybe more work than you have time for but this might just work ... real AM/FM radio uses modulation to send voice over mega Hz carrier frequencies so it can work

Thanks for the ideas, receiver doesn't have the audio file to compare. we are trying to stream from an http server. If I use higher frequency wouldn't it be detectable by listener? File format has to be 22050 Hz aac. — Mord Fustang, Mar 06 '15 at 18:46

Detecting a low frequency tone in an audio file

1 Answers1