How to get a list of notes present in a wav file?

Question

I am writing a program to help people learn guitar. To do this, I need to be able to look at a sample of time and see what note(s) they played. I looked at FFTW but I don't understand how to get this to work. I also tried to figure out the Goertzel algorithm but it seems like that is just for single-frequency notes like dial tones (not sure about that though). To be clear, I do need to be able to detect multiple notes (to see if a chord is played), but it doesn't matter too much if a few harmonics get in there too.

I'm coding this in C++, and would prefer a solution that is cross-platform.

UPDATE: I've realized it isn't so important to detect specific notes; what I really need is to check that certain frequencies are present, and that others aren't. For example, if someone plays a C, I want to check that a C frequency is present (about 262 Hz), as well as probably 524 Hz and 786 Hz, and check that nearby notes that are not near in the overtone series (like B and D) are not present.

This is a massive request. Can you narrow it down to a smaller subset of what you need? — Linuxios, Jul 09 '12 at 01:13
Sure. Basically, I need to be able to tell whether a specific frequency is present in a sample, and check that others are not (or at least are relatively low volume). — Skyler, Jul 09 '12 at 02:40
I am working on sth like you. i start with transcribing (teaching the computer to "hear") and then the teacher... I use existing VAMP plugins to to the frequency work. feel free to contact/join!!! — relascope, Aug 05 '15 at 04:35
imho to your update: it *is* important to check for the notes. in the final program it is very likely that the music to be played is presented in note form (e.g. musicxml). you don't want to compare notes to frequencies. the calculation is easy anyway: freq = 440 * 2 ^ i/12 — relascope, Aug 05 '15 at 04:40

score 3 · Accepted Answer · answered Jul 09 '12 at 01:51

3

Notes are not present in a wav file. Sampled sound is.

Humans might perceive some notes that might have been played to create the sound in some wav file, but doing automatic polyphonic pitch estimation/recognition from recorded sound into transcribed music for rich and complex waveforms, such as produced by guitars, still appears to be an advanced research topic.

When possible for certain very restricted types of music sounds, some non-trivial DSP will be involved. FFTW might be useful for a small part of the more sophisticated DSP processing needed for pitch estimation, Goertzel filtering less so.

answered Jul 09 '12 at 01:51

hotpaw2

70,107
14
90
153

I know that notes aren't present in a wav file, which is why I know I have to do some sort of processing. However, I also know there are software tuners which can tell what note you are playing, so obviously this is not impossible to do. – Skyler Jul 09 '12 at 02:41
3

A tuner showing the pitch of a long held note is completely different from analyzing more complicated and fast, polyphonic content. What you want is technology that commercial companies have developed for years and you would be looking at $$$$$$ to license it or spend those same many many years yourself implementing it. – Xenakios Jul 09 '12 at 08:31
Darn, with all the software that does things along those lines I was sure there would be an open-source library of some kind. I was looking at the [MIREX](http://www.music-ir.org/mirex/wiki/2009:Multiple_Fundamental_Frequency_Estimation_%26_Tracking) work, where it seems there was a competition for this, but I can't find anywhere that the algorithms are actually posted... *sigh* – Skyler Jul 09 '12 at 14:07
There IS a lot of software that does things "along those lines", but a vast amount of it does not work well, including all those guitar tuners that use only Goertzel or FFTW magnitude directly for note pitch. Your UPDATE is similar to the Harmonic Product Spectrum method of monophonic pitch detection or estimation, which may work a lot better for some sound sources. – hotpaw2 Jul 10 '12 at 07:27

score 0 · Answer 2 · edited May 23 '17 at 11:58

0

I can't point you to specifics but I believe what you need would be a Fourier transform to detect the frequency you're looking for. There's also a similar question here

edited May 23 '17 at 11:58

Community

1
1

answered Jul 09 '12 at 18:20

Jay

13,803
4
42
69

I know that I need something like a Fourier transform, the problem is I have no idea how to implement that. The FFTW examples aren't terribly instructive. – Skyler Jul 10 '12 at 00:18

Scott Izu · Answer 3 · 2014-04-28T04:13:39.400

What about this pdf? http://miracle.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf

The problem with the FFT is that if you do a 256 sample FFT, you will get only 256 outputs. Essentially, what this means is that it will divide your your frequency space, where there are infinite number of frequencies, into a limited set of frequencies.

This is because if you only check 256 samples (256 can be replace by N, the number of samples used for the FFT), any frequency which is related by a multiple of 256 will look the same.

In other words, if you check 256 evenly spaced samples, taken at time 0, 1/256, 2/256, 3/256, ... 255/256. Then, the two signals sin(2 pi 80 x), which has frequency 80 cycles/sec, and sin(2 pi (80 + 9*256) x), which has frequency (80+9*256), will have the same samples.

Here, 9 can be replaced by k, the multiple to use. You could replace 9 with 1,2,3,4,5, etc. You can replace 256 (N) with any value as well.

As an example, sampling both at 200/256, one of the samples, we have: sin(2 pi (80 + 9*256) (200/256)) = sin(2 pi 80 (200/256) + 2 pi * 9 * 200)

Because multiples of 2 pi don't affect sin, this is the same as sin(2 pi 80 (200/256)).

More generically, sin(2 pi (M + k*N) j/N) = sin (2 pi M (j/N) + 2 pi k*j) = sin (2 pi M (j/N) ), where j is any integer 0,..., N - 1, N is the number of samples, (j/N) is the time to sample, M is the number of cycles per second, k is any integer ... -2, -1, 0, 1, 2 ...

From Nyquist sampling, if you want to distinguish, -128, -127, -126, -125, ..., 125, 126, 127 cycles per second you would take 256 samples/sec. 256 samples/sec means distinguishing 256 frequencies. However, 0 cycles/sec, 256 cycles/sec, 512 cycles/sec, 1024 cycles/sec would all look the same.

How to get a list of notes present in a wav file?

3 Answers3

Linked