This is the easiest way to think about note onset:
think of a music signal as a flat constant signal. When and onset occurs you look at it as a large rapid CHANGE in signal (a positive or negative peak)
What this means in the frequency domain:
the FT of a constant signal is, well, CONSTANT! and flat
When the onset event occurs there is a rapid increase in spectrial content.
While you may think "Well you're actually talking about the peak of the onset right?" not at all. We are not actually interested in the peak of the onset, but rather the rising edge of the signal. When there is a sharp increase in the signal, the high frequency content increases.
one way to do this is using the spectrial difference function:
take your time domain signal and cut it up into overlaping strips (typically 50% overlap)
apply a hamming/hann window (this is to reduce spectrial smudging) (remember cutting up the signal into windows is like multiplying it by a pulse, in the frequency domain its like convolving the signal with a sinc function)
Apply the FFT algorithm on two sucessive windows
For each DFT bin, calculate the difference between the Xn and Xn-1 bins if it is negative set it to zero
square the results and sum all th bins together
repeat till end of signal.
look for peaks in signal using median thresholding and there are your onset times!
Source:
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
and
http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf