Fourier transforming a byte array

Question

I am not so proficient in Java, so please keep it quite simple. I will, though, try to understand everything you post. Here's my problem.

I have written code to record audio from an external microphone and store that in a .wav. Storing this file is relevant for archiving purposes. What I need to do is a FFT of the stored audio.

My approach to this was loading the wav file as a byte array and transforming that, with the problem that 1. There's a header in the way I need to get rid of, but I should be able to do that and 2. I got a byte array, but most if not all FFT algorithms I found online and tried to patch into my project work with complex / two double arrays.

I tried to work around both these problems and finally was able to plot my FFT array as a graph, when I found out it was just giving me back "0"s. The .wav file is fine though, I can play it back without problems. I thought maybe converting the bytes into doubles was the problem for me, so here's my approach to that (I know it's not pretty)

byte ByteArray[] = Files.readAllBytes(wav_path);
String s = new String(ByteArray);
double[] DoubleArray = toDouble(ByteArray);
// build 2^n array, fill up with zeroes
boolean exp = false;
int i = 0;
int pow = 0;
while (!exp) {
    pow = (int) Math.pow(2, i);
    if (pow > ByteArray.length) {
        exp = true;
    } else {
        i++;
    }
}
System.out.println(pow);
double[] Filledup = new double[pow];
for (int j = 0; j < DoubleArray.length; j++) {
    Filledup[j] = DoubleArray[j];
    System.out.println(DoubleArray[j]);
}
for (int k = DoubleArray.length; k < Filledup.length; k++) {
    Filledup[k] = 0;
}

This is the function I'm using to convert the byte array into a double array:

public static double[] toDouble(byte[] byteArray) {
    ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
    double[] doubles = new double[byteArray.length / 8];
    for (int i = 0; i < doubles.length; i++) {
        doubles[i] = byteBuffer.getDouble(i * 8);
    }
    return doubles;
}

The header still is in there, I know that, but that should be the smallest problem right now. I transformed my byte array to a double array, then filled up that array to the next power of 2 with zeroes, so that the FFT can actually work (it needs an array of 2^n values). The FFT algorithm I'm using gets two double arrays as input, one being the real, the other being the imaginary part. I read, that for this to work, I'd have to keep the imaginary array empty (but its length being the same as the real array).

Worth to mention: I'm recording with 44100 kHz, 16 bit and mono.

If necessary, I'll post the FFT I'm using.

If I try to print the values of the double array, I get kind of weird results:

...
-2.0311904060823147E236
-1.3309975624948503E241
1.630738286366793E-260
1.0682002560745842E-255
-5.961832069690704E197
-1.1476447092561027E164
-1.1008407401197794E217
-8.109566204271759E298
-1.6104556241572942E265
-2.2081172620352248E130
NaN
3.643749694745671E-217
-3.9085815506127892E202
-4.0747557114875874E149
...

I know that somewhere the problem lies with me overlooking something very simple I should be aware of, but I can't seem to find the problem. My question finally is: How can I get this to work?

The question is how you are transforming the byte value to a double value? This part of the code is not shown. Do you use https://docs.oracle.com/javase/8/docs/api/java/lang/Byte.html#doubleValue-- ? — lschuetze, Apr 10 '15 at 12:08
Your are talking about a header, is it part of the byte array? if this is the case, you have to skip the nb of bytes of this header before reading the doubles. — T.Gounelle, Apr 10 '15 at 12:25
The source array is not "complex". Most algorithms produce a "complex" output that includes both "real" and "imaginary" time-domain data, however. It's common to combine the real and imaginary values into a "magnitude" number by taking the square root of the sum of the squares (or simply treat the sum of the squares as a "power" value). You get half as many frequency "buckets" out as the number of time-domain values you fed in -- this is due to the "Nyquist frequency". — Hot Licks, Apr 10 '15 at 12:27
it still is part of the byte array, it should be the first 44 bytes. I can get rid of it immediately, but it should not affect the FFT signifcantly. — Furious Fry, Apr 10 '15 at 12:29
@Hot Licks I found some algorithms that use a "pseudo complex" type meaning an array of tuples to store the real and imaginary part in a single array. That's what I meant by complex source arrays. — Furious Fry, Apr 10 '15 at 12:32
Read more closely. Often the complex data is returned in the same array that was used to input the "real" time-domain data, since the two arrays would be the same size. Real and imaginary output values may be stored in alternating array elements or there may be N real values followed by N imaginary values. — Hot Licks, Apr 10 '15 at 12:36
That's true. Both real and imaginary values get stored in a single array in alternating elements. I can easily comment out the part where the FFT algorithm is filling in the imaginary information, and then get back just an array of appropriate length, but full of 0s. — Furious Fry, Apr 10 '15 at 12:46
What's the reason for the / 8 and * 8 in your code? Don't tell me it's because a double is 8 bytes, that doesn't make sense. Or is the input really an array of doubles? — Bram, Apr 10 '15 at 17:00

score 5 · Accepted Answer · edited May 23 '17 at 10:28

There's a header in the way I need to get rid of […]

You need to use javax.sound.sampled.AudioInputStream to read the file if you want to "skip" the header. This is useful to learn anyway, because you would need the data in the header to interpret the bytes if you did not know the exact format ahead of time.

I'm recording with 44100 kHz, 16 bit and mono.

So, this almost certainly means the data in the file is encoded as 16-bit integers (short in Java nomenclature).

Right now, your ByteBuffer code makes the assumption that it's already 64-bit floating point and that's why you get strange results. In other words, you are reinterpreting the binary short data as if it were double.

What you need to do is read in the short data and then convert it to double.

For example, here's a rudimentary routine to do such as you're trying to do (supporting 8-, 16-, 32- and 64-bit signed integer PCM):

import javax.sound.sampled.*;
import javax.sound.sampled.AudioFormat.Encoding;
import java.io.*;
import java.nio.*;

static double[] readFully(File file)
throws UnsupportedAudioFileException, IOException {
    AudioInputStream in = AudioSystem.getAudioInputStream(file);
    AudioFormat     fmt = in.getFormat();

    byte[] bytes;
    try {
        if(fmt.getEncoding() != Encoding.PCM_SIGNED) {
            throw new UnsupportedAudioFileException();
        }

        // read the data fully
        bytes = new byte[in.available()];
        in.read(bytes);
    } finally {
        in.close();
    }

    int   bits = fmt.getSampleSizeInBits();
    double max = Math.pow(2, bits - 1);

    ByteBuffer bb = ByteBuffer.wrap(bytes);
    bb.order(fmt.isBigEndian() ?
        ByteOrder.BIG_ENDIAN : ByteOrder.LITTLE_ENDIAN);

    double[] samples = new double[bytes.length * 8 / bits];
    // convert sample-by-sample to a scale of
    // -1.0 <= samples[i] < 1.0
    for(int i = 0; i < samples.length; ++i) {
        switch(bits) {
            case 8:  samples[i] = ( bb.get()      / max );
                     break;
            case 16: samples[i] = ( bb.getShort() / max );
                     break;
            case 32: samples[i] = ( bb.getInt()   / max );
                     break;
            case 64: samples[i] = ( bb.getLong()  / max );
                     break;
            default: throw new UnsupportedAudioFileException();
        }
    }

    return samples;
}

The FFT algorithm I'm using gets two double arrays as input, one being the real, the other being the imaginary part. I read, that for this to work, I'd have to keep the imaginary array empty (but its length being the same as the real array).

That's right. The real part is the audio sample array from the file, the imaginary part is an array of equal length, filled with 0's e.g.:

double[] realPart = mySamples;
double[] imagPart = new double[realPart.length];
myFft(realPart, imagPart);

More info... "How do I use audio sample data from Java Sound?"

That was incredibly helpful to understand the reading process from a wav file, thank you. — Furious Fry, Apr 13 '15 at 09:18

score 1 · Answer 2 · answered Apr 10 '15 at 19:09

The samples in a wave file are not going to be already 8-byte doubles that can be directly copied as per your posted code.

You need to look up (partially from the WAVE header format and from the RIFF specification) the data type, format, length and endianess of the samples before converting them to doubles.

Try 2 byte little-endian signed integers as a likely possibility.

Fourier transforming a byte array

2 Answers2