Detect specific sound in audio

Question

I have a short (~1 second) arbitrary sound file and two devices. At some unknown time, device 1 will play the sound file out of its speaker. Device 2 should then be able to detect that sound. There may be background noise. It's unknown how loud the sound will be played.

This feels like it should be a common solved problem, but searching for answers has left me with nothing.

If anyone has good a solution or could just point me in the right direction I'd be very grateful.

You're right that this problem is "solved" already dozens of times. However, there's definately not any silver bullet for it and how it should be solved in your case depends totally on the accuracy you require. It would help a lot to know what are you working with. — Simo Erkinheimo, Dec 30 '14 at 08:45
I'm working with two smartphones and mostly just sort of messing about with them. Can you link me to one of the existing solutions? I feel like this thing has a name and just nobody told me what it is. — Joel, Dec 30 '14 at 08:54
You can probably do this with [cross correlation](https://en.wikipedia.org/wiki/Cross-correlation), but at this point this is more of a DSP theory question than an actual programming question, so you should probably migrate your question to http://dsp.stackexchange.com — Paul R, Dec 30 '14 at 08:54
There are many variables before any solution can be suggested. Is there only one sounds that should be recognized or many different? Can we talk about recognizing "tones" instead of "sounds"? In real life, a sound wave is never exactly the same, and in addition to that you know you'll have some noise. How much noise and how do you define when the sound is actually the correct one? If you think something like "I can easily regocnize it, why wouldn't my software"...well there's [Machine perception](http://en.wikipedia.org/wiki/Machine_perception), but I doubt you want to touch that one. — Simo Erkinheimo, Dec 30 '14 at 09:17

score 4 · Accepted Answer · answered Jan 01 '15 at 13:43

In most distance measurement and room impulse response measurement cases researches use Maximum length sequence (MLS) or sine sweep signals. These signals are played back and recorded. The recorded audio is used along with the inverted original signal to identify the presence of the played back audio. These MLS and sine sweep signals are very robust even in noisy environment. Each of them have their own advantages.

A similar method of playing back sound in a device and hearing it from another device is done by the microsoft guys to measure the distance between the devices- http://research.microsoft.com/en-us/projects/BeepBeep/

You can play around with the MLS sequence using this MATLAB package http://www.commsp.ee.ic.ac.uk/~mrt102/projects/mls.html

Detect specific sound in audio

1 Answers1

Linked