Audio Analysis for Sheet Music

Question

I'm currently working on a program that analyses a wav file of a solo musician playing an instrument and detects the notes within it. To do this it performs an FFT and then looks at the data produced. The goal is to (at some point) produce the sheet music by writing a midi file.

I just wanted to get a few opinions on what might be difficult about it, whether anyones tried it before, maybe a few things it would be good to research. At the moment my biggest struggle is that not all notes are purely one frequency and I cannot yet detect chords; just single notes. Also there has to be a pause between the notes I am detecting so I know for sure one has ended and the other started. Any comments on this would also be very welcome!

This is the code I use when A new frame comes in from the signal. it looks for the frequency that is most dominant in the sample:

    //Get frequency vector for power match
        double[] frequencyVectorDoubleArray = Accord.Audio.Tools.GetFrequencyVector(waveSignal.Length, waveSignal.SampleRate);

        powerSpectrumDoubleArray[0] = 0; // zero DC

        double[,] frequencyPowerDoubleArray = new double[powerSpectrumDoubleArray.Length, 2];

        for (int i = 0; i < powerSpectrumDoubleArray.Length; i++)
        {
            if (frequencyVectorDoubleArray[i] > 15.00)
            {
                frequencyPowerDoubleArray[i, 0] = frequencyVectorDoubleArray[i];
                frequencyPowerDoubleArray[i, 1] = powerSpectrumDoubleArray[i];
            }
        }

    //Method for finding the highest frequency in a sample of frequency domain data
        //But I want to filter out stuff
        pulsePowerDouble = lowestPowerAcceptedDouble;//0;//lowestPowerAccepted;
        int frequencyIndexAtPulseInt = 0;
        int oldFrequencyIndexAtPulse = 0;
        for (int j = 0; j < frequencyPowerDoubleArray.Length / 2; j++)
        {
            if (frequencyPowerDoubleArray[j, 1] > pulsePowerDouble)
            {
                oldPulsePowerDouble = pulsePowerDouble;
                pulsePowerDouble = frequencyPowerDoubleArray[j, 1];

                oldFrequencyIndexAtPulse = frequencyIndexAtPulseInt;
                frequencyIndexAtPulseInt = j;
            }
        }
        foundFreq = frequencyPowerDoubleArray[frequencyIndexAtPulseInt, 0];

See http://stackoverflow.com/questions/435533/detecting-the-fundamental-frequency — mtrw, May 29 '11 at 02:00
"not all notes are purely one frequency": probably almost none note of a common music instrument is purly one frequency. BTW pure sine tones sound quite annoying to the human ear and therefore are rather rare in music. — Curd, May 30 '11 at 08:46
@Curd - good point, I had not thought of it in terms of sine waves vs music note waves. — Nyx, Jun 02 '11 at 04:27

hotpaw2 · Accepted Answer · 2011-05-30T07:10:07.733

4

1) There is a lot (several decades worth) of research literature on frequency estimation and pitch estimation (which are two different subjects).

2) Peak FFT frequency is not the same as the musical pitch. Some solo musical instruments can produces well over a dozen frequency peaks for just one note, let alone a chord, and with none of the largest peaks anywhere near the musical pitch. For some common instruments, the peaks might not even be mathematically exact harmonics.

3) Using the peak bin of a short unwindowed FFT isn't a great frequency estimator.

4) Note onset detection might require some sophisticated pattern matching, depending on the instrument.

edited May 30 '11 at 07:10

answered May 30 '11 at 06:50

hotpaw2

70,107
14
90
153

Hi. Thank you, I am now using a window as well (Hann) and have been considering hidden markov models for pattern matching. The window greatly improves the accuracy of the frequencies I find. RE 2: that is indeed a problem. Currently I'm kinda solving part of it by checking for occurrences of the lower octaves of the most powerful frequency and the highest frequency in a set of sample. – Nyx Jun 02 '11 at 04:21

score 1 · Answer 2 · answered May 29 '11 at 01:19

1

You don't want to focus on the highest frequency, but rather the lowest. Every note from any musical instrument is full of harmonics. Expect to hear the fundamental, and every octave above it. Plus all the second and third harmonics.

Harmonics is what makes a trumpet sound different from a trombone when they are both playing the same note.

answered May 29 '11 at 01:19

fishtoprecords

2,394
7
27
38

Keep in mind though that often the fundamental will be missing (http://en.wikipedia.org/wiki/Missing_fundamental). – mtrw May 29 '11 at 01:59
Yes the code above only takes the loudest in the sample. Later I look for if there is an occurrence of the lower frequency so that, for example, if I I am playing A4 and get data back like this: 440, 880, 880, 880, 880 Showing A5 as more likely the note I will assume it is A4 because there is at least one occurrence of A4. However, the lower the notes get I have found the harder it is to get any occurrences because they are note loud enough. – Nyx May 29 '11 at 08:05

score 1 · Answer 3 · answered Jun 30 '11 at 09:24

Unfortunately this is an extremely hard problem, some of the reasons have already been given. I would start with a literature search (Google Scholar, for instance) for "musical note identification".

If this isn't a spare time project, beware - I have seen masters theses founder on this particular shoal without getting any useful results.

Audio Analysis for Sheet Music

3 Answers3