0

I need to measure signal frequency while the musicians play music, and it happens to be a bit too fast for FFT (Fast Fourier Transform).

Musicians play music at 90-140 bpm. This means that there are 90-140 groups of notes each minute, up to 8 (more frequently, up to 4) notes in each group (60/140/8 = 0.0536 sec, 60/90/4 = 0.167 sec), that is, notes may change at the rate of 6-19 notes per second.

The music uses a logarithmic scale: the range between, say, 440Hz and 880Hz is divided into 12 notes, only 7 of which are used for melody. (Basically, they use only the white keys on the piano; when they want to shift the starting frequency, they use some of the black keys and don't use some white keys.) That is, the frequency of each next note is multiplied by 2^(1/12) = 1.05946.

To make things more complicated, the A (La) frequency may vary from 438 to 446 Hz. The string instruments in theory can be tuned, while the wind instruments depend on the air temperature and humidity, so the frequency happens to be re-negotiated by the musicians during the sound check.

Sometimes musicians and vocalists make errors in frequency, they call it "out of tune". They want a device that would inform them of such "out of tune errors". They have tuners, but the tuners require playing the same sound for about 1 sec before they start showing anything. This works for tuning, but does not work while the music is played.

Most likely, the tuner is doing FFT, and due to the formula

df = 1/T

waits for 1 second to get the 1Hz resolution.

For A=440Hz, the difference in frequency between two notes is 440*0.05946 = 26.16 Hz, to get that frequency resolution, one has to use acquisition time of 0.038 sec, that is, at tempo=196bpm FFT is able to just distinguish two notes, at 98 bpm it is able to tell a 50% out-of-tune error provided that it starts acquisition at the very moment that the pitch changes. If we allow the pitch change in the course of an acquisition period, we get 49 bpm, which is just too slow. In addition, it is very desirable to be more precise about the frequency, say, detect a 25% out-of-tune error.

Is there a way to measure frequency better than FFT, that is, with better resolution in less acquisition time? (At least 2 times better, ideally, 8 times better.) In exchange, I do not need to distinguish between notes of different octaves, e.g. both 440 and 880 may be recognized as A. (Probably, more trade-offs are possible, just nothing else comes to my mind right now.)

UPD Here's a really good drawing:

Note frequencies linked from Wikipedia

UPD2

I have found a PhD thesis and open source software (TARTINI -- the real-time music analysis tool) at:

http://miracle.otago.ac.nz/tartini/

(The pages are also available via the web archive service: http://web.archive.org = http://archive.org = http://waybackmachine.org )

18446744073709551615
  • 16,368
  • 4
  • 94
  • 127
  • 1
    You say *frequency*, but I suspect you mean *pitch* ? – Paul R Nov 12 '15 at 09:14
  • Yes, pitch, but pitch is frequency: "the perceived frequency of sound", I do not see much difference but probably I am too technical https://en.wikipedia.org/wiki/Pitch_(music)#Pitch_and_frequency . Anyway, what I can measure is frequency. – 18446744073709551615 Nov 12 '15 at 09:27
  • 2
    Actually this isn't just pedantry - it makes a significant difference if you're dealing with music. Frequency is a *physical* quantity, whereas pitch is a *percept*, and has a fairly complex relationship with the frequencies and amplitudes of the components of a given sound. An FFT (or more accurately a power spectrum derived from an FFT) will tell you the frequencies and amplitudes of the components, but getting from here to the perceived pitch is non-trivial (i.e. it's not just the frequency of the fundamental component or the loudest component). See: Harmonic Product Spectrum. – Paul R Nov 12 '15 at 09:47
  • 1
    Another piece of the puzzle that you may be missing: it sounds like you're assuming that sample windows will be consecutive, so you only get 1 pitch estimate per window, but a commonly used technique is to overlap successive sample windows, e.g. if you overlap each window by 75% then you get pitch estimates at 4 times the rate, but with the same resolution (albeit with some correlation between successive windows, due to the overlap). – Paul R Nov 12 '15 at 10:00
  • You may try wavelets or CQT (https://en.wikipedia.org/wiki/Constant_Q_transform) – Archie Nov 12 '15 at 11:59
  • A link to wavelets: https://www.eecis.udel.edu/~amer/CISC651/IEEEwavelet.pdf – 18446744073709551615 Nov 12 '15 at 13:03
  • @PaulR if I can overlap successive windows, why cannot I only partially fill the array, 1/4 with the samples and 3/4 with zeroes? (At least it will be zeroes rather than a different note.) – 18446744073709551615 Nov 12 '15 at 14:07
  • 2
    @18446744073709551615: that just gives you an N/4 point FFT with the output interpolated to N points - it doesn't magically give you the resolution of an N point FFT. – Paul R Nov 12 '15 at 14:50
  • 1
    BTW, since you're only at the theory stage here, might I suggest you take this to http://dsp.stackexchange.com ? It will be more on-topic there and you'll likely get better answers from people more knowledgeable than I. – Paul R Nov 12 '15 at 14:53
  • I'm voting to close this question as off-topic because it is about DSP rather than programming DSP – Raedwald Mar 17 '16 at 18:28

1 Answers1

2

Regarding the FFT, assuming the narrow-band spectral frequency content is sparse and well separated in low enough background noise, frequency peaks can be interpolated or phase vocoded to much higher resolution than the FFT bin spacing (bin spacing as related to the inverse of the length of the segment of actual time-domain data). Parabolic interpolation is common, but there are other more accurate interpolation kernels. Phase vocoder frequency estimation methods require stationarity across 2 overlapped frames, however the total span of those 2 frames can be relatively short.

But the peak spectral frequency reported by an FFT is not the same as a pitch frequency as perceived by a human (as voices and many musical instruments can radiate more acoustic spectral energy in an overtone series than at pitch frequency, sometimes slightly inharmonically.) There are algorithms more suited for pitch estimation than FFTs (alone). A partial list is in this answer: FFT on iPhone to ignore background noise and find lower pitches

Many academic papers on pitch estimation methods for music can be found on the music-ir/MIREX site: http://www.music-ir.org/mirex/wiki/MIREX_HOME

Community
  • 1
  • 1
hotpaw2
  • 70,107
  • 14
  • 90
  • 153