0

I am trying to create an iOS app that will perform an action when it detects a clapping sound.

Things I've tried:

1) My first approach was to simply measure the overall power using an AVAudioRecorder. This worked OK but it could get set off by talking too loud, other noises, etc so I decided to take a different approach.

2) I then implemented some code that uses a FFT to get the frequency and magnitude of the live streaming audio from the microphone. I found that the clap spike generally resides in the 13kHZ-20kHZ range while most talking resides in a lot lower frequencies. I then implemented a simple thresh-hold in this frequency range, and this worked OK, but other sounds could set it off. For example, dropping a pencil on the table right next to my phone would pass this thresh-hold and be counted as a clap.

3) I then tried splitting this frequency range up into a couple hundred bins and then getting enough data where when a sound passed that thresh-hold my app would calculate the Z-Score (probability from statistics) and if the Z-Score was good, then could that as a clap. This did not work at all as some claps were not recognized and some other sounds were recognized.

Graph:

To try to help me understand how to detect claps, I made this graph in Excel (each graph has around 800 data points) and it covers the 13kHZ-21kHZ range: Clapping Graph

Where I am now:

Even after all of this, I am still not seeing how to recognize a clap versus other sounds. Any help is greatly appreciated!

Praxder
  • 2,315
  • 4
  • 32
  • 51
  • 2
    It's a non-trivial problem - you need to extract relatively invariant *features* from your time-frequency data and use these features for comparison. – Paul R Jul 14 '14 at 21:07
  • 1
    Also try looking at *spectrograms* rather than just individual power spectra - there is a lot of information in the time-varying envelope of the frequency components which you are oblivious to if you just look at a single power spectrum. – Paul R Jul 14 '14 at 21:15
  • 1
    Search for "clap" on Stack Overflow. See for instance http://stackoverflow.com/questions/980673/clap-sound-detection-in-c-sharp – dpwe Jul 14 '14 at 21:52
  • I have searched for this on SO before and went through a ton of links(I have actually already read that link) but nothing that I have found has worked. – Praxder Jul 14 '14 at 22:10
  • I think hundreds of bins is too many. The z-score approach (which I think amounts to a Gaussian model) on maybe 6-10 octave bands might generalize better. Other considerations include (a) normalize by overall energy first, so that the model will match regardless of a variable amplitude, and (b) make your system respond only to rapid onsets in energy, for instance by putting a threshold on the amount by which the energy increases across successive frames. – dpwe Jul 14 '14 at 22:30
  • Good answers here http://stackoverflow.com/questions/499795/given-an-audio-stream-find-when-a-door-slams-sound-pressure-level-calculation – Nikolay Shmyrev Dec 11 '14 at 21:12
  • @Shredder2794 I know this question is old, but how were you able to get the audio data from streaming sound and/or files on disk? I have a similar project, but I can't figure out how to extract individual audio samples in iOS to do a Fourier Transform. Ideally I'd end up with an array of Floats to perform calculations on. – Hundley Jun 24 '15 at 01:30
  • @hundley Here is some code that I used, although I can't guarantee that it's bug free: [http://pastie.org/10256802](http://pastie.org/10256802) – Praxder Jun 24 '15 at 16:00

0 Answers0