Apply FFT to audio recording in java

Question

I have seen similar questions similar to this one on this website but my question is a little different. The code I am using to capture the audio is this. I would like to simply take the captured audio and apply an FFT to it with 256 points.

I realized that this line count = line.read(buffer, 0, buffer.length); breaks up the audio into "chunks".

Also the FFT I am using can be found here.

My questions are:

I would like to know if there is a way to apply the FFT to the whole audio recording not just a buffered amount.
I see that the code for the FFT requires a real and imaginary part, how would I get the real and imaginary parts from the code with the audio file.

You can't 'simply' do this. You *can* complicatedly do it by converting the bytes to audio samples (manually, as it were). Have you looked in to it much? — Radiodef, Jan 31 '14 at 00:01
I was advised by one of my lecturers to do it this way so I did not think about doing it the way you advised. Is there an easier way to do this because I really need the FFT with 256 points. I will definitely read up the method you suggested though. — Baker Johnson, Jan 31 '14 at 00:14
What I said is basically the only way to do it with Java. All you can do with Java sound is read the raw bytes in and write them to the output. It's absolutely possible to then "intercept" the stream, convert them yourself and do whatever you want with them. — Radiodef, Jan 31 '14 at 00:17
Would this be how i convert them http://stackoverflow.com/questions/20218634/change-an-array-of-byte-of-audio-sample-to-frequency ? — Baker Johnson, Jan 31 '14 at 00:27

score 6 · Accepted Answer · edited May 23 '17 at 12:22

All the javax.sound.sampled package does is read the raw bytes from the file and write them to the output. So there's an 'in between' step that you have to do which is converting the samples yourself.

The following shows how to do this (with comments) for PCM, taken from my code example WaveformDemo:

public static float[] unpack(
    byte[] bytes,
    long[] transfer,
    float[] samples,
    int bvalid,
    AudioFormat fmt
) {
    if(fmt.getEncoding() != AudioFormat.Encoding.PCM_SIGNED
            && fmt.getEncoding() != AudioFormat.Encoding.PCM_UNSIGNED) {

        return samples;
    }

    final int bitsPerSample = fmt.getSampleSizeInBits();
    final int bytesPerSample = bitsPerSample / 8;
    final int normalBytes = normalBytesFromBits(bitsPerSample);

    /*
     * not the most DRY way to do this but it's a bit more efficient.
     * otherwise there would either have to be 4 separate methods for
     * each combination of endianness/signedness or do it all in one
     * loop and check the format for each sample.
     * 
     * a helper array (transfer) allows the logic to be split up
     * but without being too repetetive.
     * 
     * here there are two loops converting bytes to raw long samples.
     * integral primitives in Java get sign extended when they are
     * promoted to a larger type so the & 0xffL mask keeps them intact.
     * 
     */

    if(fmt.isBigEndian()) {
        for(int i = 0, k = 0, b; i < bvalid; i += normalBytes, k++) {
            transfer[k] = 0L;

            int least = i + normalBytes - 1;
            for(b = 0; b < normalBytes; b++) {
                transfer[k] |= (bytes[least - b] & 0xffL) << (8 * b);
            }
        }
    } else {
        for(int i = 0, k = 0, b; i < bvalid; i += normalBytes, k++) {
            transfer[k] = 0L;

            for(b = 0; b < normalBytes; b++) {
                transfer[k] |= (bytes[i + b] & 0xffL) << (8 * b);
            }
        }
    }

    final long fullScale = (long)Math.pow(2.0, bitsPerSample - 1);

    /*
     * the OR is not quite enough to convert,
     * the signage needs to be corrected.
     * 
     */

    if(fmt.getEncoding() == AudioFormat.Encoding.PCM_SIGNED) {

        /*
         * if the samples were signed, they must be
         * extended to the 64-bit long.
         * 
         * so first check if the sign bit was set
         * and if so, extend it.
         * 
         * as an example, imagining these were 4-bit samples originally
         * and the destination is 8-bit, a mask can be constructed
         * with -1 (all bits 1) and a left shift:
         * 
         *     11111111
         *  <<  (4 - 1)
         *  ===========
         *     11111000
         * 
         * (except the destination is 64-bit and the original
         * bit depth from the file could be anything.)
         * 
         * then supposing we have a hypothetical sample -5
         * that ought to be negative, an AND can be used to check it:
         * 
         *    00001011
         *  & 11111000
         *  ==========
         *    00001000
         * 
         * and an OR can be used to extend it:
         * 
         *    00001011
         *  | 11111000
         *  ==========
         *    11111011
         * 
         */

        final long signMask = -1L << bitsPerSample - 1L;

        for(int i = 0; i < transfer.length; i++) {
            if((transfer[i] & signMask) != 0L) {
                transfer[i] |= signMask;
            }
        }
    } else {

        /*
         * unsigned samples are easier since they
         * will be read correctly in to the long.
         * 
         * so just sign them:
         * subtract 2^(bits - 1) so the center is 0.
         * 
         */

        for(int i = 0; i < transfer.length; i++) {
            transfer[i] -= fullScale;
        }
    }

    /* finally normalize to range of -1.0f to 1.0f */

    for(int i = 0; i < transfer.length; i++) {
        samples[i] = (float)transfer[i] / (float)fullScale;
    }

    return samples;
}

public static int normalBytesFromBits(int bitsPerSample) {

    /*
     * some formats allow for bit depths in non-multiples of 8.
     * they will, however, typically pad so the samples are stored
     * that way. AIFF is one of these formats.
     * 
     * so the expression:
     * 
     *  bitsPerSample + 7 >> 3
     * 
     * computes a division of 8 rounding up (for positive numbers).
     * 
     * this is basically equivalent to:
     * 
     *  (int)Math.ceil(bitsPerSample / 8.0)
     * 
     */

    return bitsPerSample + 7 >> 3;
}

That piece of code assumes float[] and your FFT wants a double[] but that's a fairly simple change. transfer and samples are arrays of length equal to bytes.length * normalBytes and bvalid is the return value from read. My code example assumes AudioInputStream but the same conversion should be applicable to a TargetDataLine. I am not sure you can literally copy and paste it but it's an example.

Regarding your two questions:

You can take a very long FFT on the entire recording or average the FFTs from each buffer.
The FFT you linked to computes in place. So the real part is the audio samples and the imaginary part is an empty array (filled with zeros) of length equal to the real part.

But when the FFT is done there's still a couple things you have to do that I don't see the linked class doing:

Convert to polar coordinates.
Typically discard the negative frequencies (the entire upper half of the spectrum which is a mirror image of the lower half).
Potentially scale the resulting magnitudes (the real part) by dividing them by the length of the transform.

Edit, related:

How do I use audio sample data from Java Sound?

Thank you so much for you response, it was quite thorough! Exactly what I wanted. I just wanted to know if there was any material I could read up on to figure out how to convert to polar coordiantes — Baker Johnson, Feb 01 '14 at 19:27
You're welcome. For polar coordinates, I guess, http://www.dspguide.com/ch8/8.htm (equation 8-6). Just make a little method that calculates the equation. — Radiodef, Feb 01 '14 at 19:43

Apply FFT to audio recording in java

1 Answers1