0

I have been looking at different methods of detecting the pitch of a tone sung into the microphone.

Seeing as I want to find how closely it resonates with a particular pitch class, I wonder if I could do some sort of physics-based resonance algorithm.

If you hold down to sustain pedal on the piano, and sing a tone into it, (and if you are close enough to one of its existing pitches) a note will resonate sympathetically.

I would love to be able to model this behaviour. But how would I go about the task? Can anyone help me move this forward?

P i
  • 29,020
  • 36
  • 159
  • 267
  • 1
    To isolate the input frequencies you will need the Fourier decomposition. Do you know how to do that? Just don't want to post something obvious to you! – Dr. belisarius Nov 01 '10 at 11:57
  • Nothing would be too obvious for me at this point, so please fire away! – P i Nov 01 '10 at 20:12
  • http://stackoverflow.com/questions/3438429/peak-detection-in-performous-code/4209242#4209242 this thread links to an article containing working source code that solves the problem beautifully! – P i Nov 19 '10 at 00:54

6 Answers6

2

Take a look at the autocorrelation function.

midtiby
  • 14,550
  • 6
  • 34
  • 43
2

One interesting solution I found is simply feeding the microphone input into a Karplus Strong algorithm.

So Karplus Strong simulates a plucked string by:

  • creating a circular buffer (if we are sampling at 44.1 kHz and we wish to simulate the middle A ie A4 which is 440Hz, then our buffer size would be ~101 elements)
  • filling it full of static between -1 and 1
  • walking the circle, each time setting the current value to the average of the previous two, ( and emitting current value to the speaker )
  • a dampening constant can be added

Now if we add the microphone stream into this process, so:

x = ( ringBuf[prev] + ring theBuf[prev2] ) * 0.5 * 0.998;
micOut[moPtr++] = x;
ringBuf[curr] = x + micIn[miPtr++];

It actually simulates singing into a guitar really accurately. if you get your tone spot on, it truly wails.

But there is a severe problem with this approach: consider the pitch generated by a buffer of 100 elements, and that generated by a buffer of 101 elements. there is no way to generate any pitch in between these two values. we are limited to a discrete working set Of pitches. while this is going to be pretty accurate for low notes (A2 would have a buffer length of ~400), the higher we go the more the error: A7 would have a buffer length of ~12.5. That error is probably over a semitone.

I cannot see any way of countering this problem. I think the approach has to be dropped.

P i
  • 29,020
  • 36
  • 159
  • 267
1

An algorithm based entirely on a discrete fourier transform (DFT) has a number of drawbacks. One problem is the temporal resolution, since the DFT works on samples within a window, you cannot determine pitch changes within that window. Another problem is the discrete logarithmic frequency resolution of DFT which might not be good enough for a pitch detector. After all a DFT only finds waves with integer wavelengths of the window size.

A slightly advanced algorithm could do something like this:

  1. Roughly detect pitch frequency (could be done with DFT).
  2. Bandpass signal to filter isolate pitch frequency.
  3. Count the number of samples between two peaks in the filtered signals.

By counting the number of samples you get a pitch resolution matching the sample frequency. If you want even higher resolution than the sample frequency, you could fit a function, such as a polynomial, to the samples around the peak point. Since you have suppressed other frequencies, you should be able to do that.

As another answer suggests, you can also use auto-correlation to find maximum signal repetition within a signal. However I should say that it is not trivial to implement a good auto-correlation pitch detector. Without knowing it I would assume that guitar-tuners and similar cheap electronics base their algorithm on a band filter combined with counting the sample distance between peaks.

Holstebroe
  • 4,993
  • 4
  • 29
  • 45
1

You can use a dampened harmonic oscillator with the input as driving force. Choose the parameters of the oscillator so that it's resonance frequency matches the frequency you want.

You'll find an analysis of the dampened harmonic oscillator in most theoretical physics books on mechanics.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
1

One approach I've found to be helpful is to generate two reference waves 90 degrees apart (I call them "sine" and "cosine") and take the dot product of the input waveform with those reference waves over some fairly short (say 1/60 second) stretches of the input. That will give you a somewhat noisy indicator of how much of the input frequency you have that's in phase or out of phase with regard to your reference waves (the square root of the sum of the squares of the values generated using the two reference waves will be the amplitude). With a small window size, you'll notice that the output is rather noisy, but if you filter the output with something like a simple FIR or IIR filter you should probably get something pretty reasonable.

One nice trick is to generate two amplitude numbers: for the first one, run the sine and cosine amplitudes through two rounds of filtering, then compute the sum of the squares. For the second, run the amplitudes through one round of filtering, then compute the sum of the squares, and then run that through another round of filtering.

Both amplitude measurements will experience the same delay, but the first one will be much more selective than the second; you can thus tell very clearly whether a frequency is 'right on' or is a bit off. Using this approach, it's possible to detect DTMF tones quickly while rejecting tones that are even a few Hz off (off-pitch tones will pick up much more strongly on the 'loose' detector than the tight one).

Sample code:

double sine_phase,sine_freq;
void process_some_waves(double *input, int16 len, 
  double *sine_phase, double sine_freq, 
  double *sine_result, double *cosine_result)
{
  int i;
  double phase, sin_tot,cos_tot;
  phase = *sine_phase;
  sin_tot = cos_tot = 0;
  for (i=0; len > i; i++)
  {
    sin_tot += input[i] * sin(phase);
    cos_tot += input[i] * cos(phase);
    phase += sine_freq;
  }
  *sine_result = sin_tot;
  *cosine_result = cos_tot;
  *sine_phase = phase;
}

/* Takes first element in buffer and 'smears' it through buffer with simple Gaussian resp. */
void simple_fir_filter(double *buff, int buffsize)
{
  int i;

  for (i=buffsize-1; i>=2; i--)
    buff[i] = (buff[i-1] + buff[i-2])/2;
}

#define FILTER_SIZE1 10
#define FILTER_SIZE2 8
#define SECTION_LENGTH 128
#define FREQ whatever

double sine_buff1[FILTER_SIZE1], sine_buff2[FILTER_SIZE2];
double cos_buff1[FILTER_SIZE1], cos_buff2[FILTER_SIZE2];
double combined_buff[FILTER_SIZE2];
double tight_amplitude, loose_amplitude;
double ref_phase;

void handle_some_data(double *input)
{
  /* Put results in first element of filter buffers */
  process_some_waves(input, SECTION_LENGTH, &ref_phase, FREQ, sine_buff1, cos_buff1);

  /* Run first stage of filtering */

  simple_fir_filter(sine_buff1, FILTER_SIZE1); 
  simple_fir_filter(cosine_buff1, FILTER_SIZE1);

  /* Last element of each array will hold results of filtering. */
  /* Now do second stage */

  sine_buff2[0] = sine_buff1[FILTER_SIZE1-1];
  cosine_buff2[0] = cosine_buff1[FILTER_SIZE1-1];
  combined_buff[0] = sine_buff2[0]*sine_buff2[0] + cosine_buff2[0]*cosine_buff2[0];
  simple_fir_filter(sine_buff2, FILTER_SIZE2); 
  simple_fir_filter(cosine_buff2, FILTER_SIZE2); 
  simple_fir_filter(combined_buff, FILTER_SIZE2); 

  tight_amplitude = sine_buff2[FILTER_SIZE2-1]*sine_buff2[FILTER_SIZE2-1] + 
                    cosine_buff2[FILTER_SIZE2-1]*cosine_buff2[FILTER_SIZE2-1];
  loose_amplitude = combined_buff2[FILTER_SIZE2-1];
}

The code here uses 'double' for all math other than array subscripting. In practice, it would almost certainly be faster to replace some of the math with integer maths. On machines with floating point, I would expect the best approach would be to keep the phase as a 32-bit integer and use a table of ~4096 'single' sine values (the smaller the table size in RAM, the better the cache coherency performance). I used code very much like the above on a fixed-point (integer) DSP with great success; the sine and cosine computations in process_some_waves were done in separate "loops", with each "loop" being realized as a single instruction with a "repeat" prefix.

supercat
  • 77,689
  • 9
  • 166
  • 211
0

I have been reading up on Fourier analysis.

Basically if you wish to extract a frequency f out of a signal, you just throw in sine wave frequency f, multiply it with the original signal, and integrate

If the original signal didn't contain anything of frequency f, you should get pretty much zero. if it DOES, then you will get out a measure of how much energy in the signal is at that frequency.

Although there is some fairly tricky math behind it, it makes sense intuitively: just looking at it, everything in the signal that is at frequency f will interfere constructively with the sine wave leaving residue; everything not at frequency f could be essentially considered as random noise, (ie there is pretty much the same amount of stuff above zero as below) having no net effect when multiplied with our sine wave. everything cancels.

This correlates with what I was fishing for. To complete my analogy above: To check what notes a piano contains, you just hold the pedal down and sing a rising tone into it, and whenever a sympathetic resonance occurs you can jot down that the piano has a note at that frequency.

Of course this is not without its failings: If you hold down C1 (no pedal this time) and sing/play C2, C1 will resonate at twice its fundamental frequency producing a C2 sound.

Similarly playing G2 will make it resonate at three times its fundamental frequency etc

P i
  • 29,020
  • 36
  • 159
  • 267
  • 1
    The main problem with fourier transform is that it happens on fixed blocks. And only supports frequencies which are an integral multiple of the base frequency. Whether that's OK depends on what you require exactly. – CodesInChaos Nov 03 '10 at 19:10
  • I noticed this. So if I set my fourier transform's base frequency to say A0, it will detect all of the A's, as well as a handful of other notes like E2, B4 which happen to be harmonics of A0. (although in a modern tuning system it would only nail the octaves of A; the others would be slightly out). Whatever I set my base frequency to, I am always going to run the risk that a particular note lies right on the boundary between two bins. What happens in this case? Does half of the notes energy for into each Bin? Or is this signal lost entirely? – P i Nov 04 '10 at 05:34