I have recorded audio spanning a few seconds and containing two similar tones generated with the same frequency. They are a couple of seconds apart. What I would like to do is detect the end of the first tone and the beginning of the second tone in terms of how many samples apart they are for this audio file. Assume 16-bit signed PCM at 48KHz audio and a byte array to represent the raw audio.
I am struggling on how I can figure this out;
a) Run a DFT to detect the occurrence of the specific frequency of the tone
b) Since the two tones are the loudest, somehow, figure out the peaks and where they start/end for the two tones
c) Run the audio file through a band pass filter to filter out all other frequencies, I will potentially end up with two lines, or two non-zero segments in the array
What is the most straightforward method (feel free to suggest other techniques)?