2

I am working on a sound to detect when the sound beep starts using librosa in Python. When I plot the detected time, it has some offset as shown with a red line in the figure. This offset changes if the interval between the beeps changes. Since I want a robust method that detects the start of the wavelet in sound for variable beeps interval, How can I remove this offset?

Figure with Onset detection and the Difference between the intervals Offset in detection

x, sr = librosa.load('c.wav')
plt.figure(figsize=(30, 7))
librosa.display.waveplot(x, sr=sr)
onset_frames=librosa.onset.onset_detect(y=x, sr=sr, units='frames')
o_env = librosa.onset.onset_strength(x, sr=sr)
times = librosa.frames_to_time(np.arange(len(o_env)), sr=sr)

onset_times = librosa.frames_to_time(onset_frames)
a=onset_times[1:-1]

b=onset_times[2:-1]
b=np.append(b,[0])
Diff=b-a

print(Diff)
plt.vlines(a, 0, x.max(), color='r', alpha=0.9,
linestyle='--', label='Onsets')
plt.axis('tight')
plt.legend(frameon=True, framealpha=0.75)
plt.show()

I have added the sound files in the below link. One can simulate the file directly. Python Colab Notebook: Link

Masood Salik
  • 119
  • 1
  • 1
  • 10
  • Are the onsets always so clear like here? With soundlevels multiple times that of the background sound? – Jon Nordby Dec 03 '20 at 10:07
  • Are you sure the offset depends on the incoming audio? On your plot it looks rather constant. Also, what is your requirement for the precision of the timings? – Jon Nordby Dec 03 '20 at 11:59
  • The onsets are usually clear with sound level multiple times that of background. I have periodic beeps in single audio. The time interval between them can be 100ms 250ms or 500ms. The figure shown has an interval of 500ms. My requirement is for any interval of beeps, the code should precisely locate its starting point and precisely measure its interval. One can see in the figure that there is a drift in its selection of start point hence its interval calculations also have an error. One can directly simulate the beeps given in colab file link in the post. – Masood Salik Dec 03 '20 at 16:50
  • The onset_backtrack function might be useful https://librosa.org/doc/latest/generated/librosa.onset.onset_backtrack.html#librosa.onset.onset_backtrack – Jon Nordby Dec 03 '20 at 22:24
  • Also I would recommend using autocorrelation to determine the most likely interval, if you can compute it over a couple of beeps. Median filtering is another alternative – Jon Nordby Dec 03 '20 at 22:28
  • @jonnor If I use backtrack then, then I get this [link](https://ibb.co/dfSvv5z). It also has an offset. I took the average of raw and backtrack, I got a bit close to my desired goal. I am trying to get more precision. – Masood Salik Dec 05 '20 at 19:10
  • 1
    It seems that your RMS track does not have enough temporal resolution. Decrease frame_length and hop_length by 4x or 8x. And invert it before passing to backtrack, then backtrack should hit the highest volumes – Jon Nordby Dec 05 '20 at 23:31

1 Answers1

0

You can use the Yamnet, it is trained on 512 different sounds with state of the art architecture and data. https://www.tensorflow.org/tutorials/audio/transfer_learning_audio

For the red line it may happen due to the change in the sampling rate. You have to match the sampling rate of the beep.