How to compare / match two non-identical sound clips

Question

I need to take short sound samples every 5 seconds, and then upload these to our cloud server.

I then need to find a way to compare / check if that sample is part of a full long audio file.

The samples will be recorded from a phones microphone, so they will indeed not be exact.

I know this topic can get quite technical and complex, but I am sure there must be some libraries or online services that can assist in this complex audio matching / pairing.

One idea was to use a audio to text conversion service and then do matching based on the actual dialog. However this does not feel efficient to me. Where as matching based on actual sound frequencies or patterns would be a lot more efficient.

I know there are services out there such as Shazam that do this type of audio matching. However I would imagine their services are all propriety.

Some factors that could influence it:

Both audio samples with be timestamped. So we donot have to search through the entire sound clip.

Scott Stensland · Answer 1 · 2020-05-31T10:12:03.590

To give you traction on getting an answer you need to focus on an answerable question where you have done battle and show your code

Off top of my head I would walk across the audio to pluck out a bucket of several samples ... then slide your bucket across several samples and perform another bucket pluck operation ... allow each bucket to contain overlap samples also contained in previous bucket as well as next bucket ... less samples quicker computation more samples greater accuracy to an extent YMMV

... feed each bucket into a Fourier Transform to render the time domain input audio into its frequency domain counterpart ... record into a database salient attributes of the FFT of each bucket like what are the X frequencies having most energy (greatest magnitude on your FFT)

... also perhaps store the standard deviation of those top X frequencies with respect to their energy (how disperse are those frequencies) ... define additional such attributes as needed ... for such a frequency domain approach to work you need relatively few samples in each bucket since FFT works on periodic time series data so if you feed it 500 milliseconds of complex audio like speech or music you no longer have periodic audio, instead you have mush

Then once all existing audio has been sent through above processing do same to your live new audio then identify what prior audio contains most similar sequence of buckets matching your current audio input ... use a Bayesian approach so your guesses have probabilistic weights attached which lend themselves to real-time updates

Sounds like a very cool project good luck ... here are some audio fingerprint resources

does audio clip A appear in audio file B Detecting audio inside audio [Audio Recognition] Detecting audio inside audio [Audio Recognition]

Detecting a specific pattern from a FFT in Arduino Detecting a specific pattern from a FFT in Arduino

Audio Fingerprinting using the AudioContext API https://news.ycombinator.com/item?id=21436414 https://iq.opengenus.org/audio-fingerprinting/

Chromaprint is the core component of the AcoustID project. It's a client-side library that implements a custom algorithm for extracting fingerprints from any audio source https://acoustid.org/chromaprint

Detecting a specific pattern from a FFT Detecting a specific pattern from a FFT in Arduino

Audio landmark fingerprinting as a Node Stream module - nodejs converts a PCM audio signal into a series of audio fingerprints. https://github.com/adblockradio/stream-audio-fingerprint

SO followup How to compare / match two non-identical sound clips How to compare / match two non-identical sound clips

Audio fingerprinting and recognition in Python https://github.com/worldveil/dejavu

Audio Fingerprinting with Python and Numpy http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/

MusicBrainz: an open music encyclopedia (musicbrainz.org) https://news.ycombinator.com/item?id=14478515

https://acoustid.org/chromaprint How does Chromaprint work? https://oxygene.sk/2011/01/how-does-chromaprint-work/

https://acoustid.org/

MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public. https://musicbrainz.org/

Chromaprint is the core component of the AcoustID project. It's a client-side library that implements a custom algorithm for extracting fingerprints from any audio source https://acoustid.org/chromaprint

Audio Matching (Audio Fingerprinting)

Is it possible to compare two similar songs given their wav files? Is it possible to compare two similar songs given their wav files?

audio hash https://en.wikipedia.org/wiki/Hash_function#Finding_similar_records

audio fingerprint https://encrypted.google.com/search?hl=en&pws=0&q=python+audio+fingerprinting

ACRCloud https://www.acrcloud.com/ How to recognize a music sample using Python and Gracenote?

Audio landmark fingerprinting as a Node Stream module - nodejs converts a PCM audio signal into a series of audio fingerprints. https://github.com/adblockradio/stream-audio-fingerprint

How to compare / match two non-identical sound clips

1 Answers1

Linked