What exactly does AudioInputStream.read method return?

Question

I have some problems finding out, what I actually read with the AudioInputStream. The program below just prints the byte-array I get but I actually don't even know, if the bytes are actually the samples, so the byte-array is the audio wave.

File fileIn;
AudioInputStream audio_in;
byte[] audioBytes;
int numBytesRead;
int numFramesRead;
int numBytes;
int totalFramesRead;
int bytesPerFrame;

try {
        audio_in = AudioSystem.getAudioInputStream(fileIn);
        bytesPerFrame = audio_in.getFormat().getFrameSize();


        if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
            bytesPerFrame = 1;
        } 

        numBytes = 1024 * bytesPerFrame; 
        audioBytes = new byte[numBytes];
        try {
            numBytesRead = 0;
            numFramesRead = 0;   
        } catch (Exception ex) { 
            System.out.println("Something went completely wrong");
        }
    } catch (Exception e) {
        System.out.println("Something went completely wrong");
    }

and in some other part, I read some bytes with this:

try {
        if ((numBytesRead = audio_in.read(audioBytes)) != -1) {                 
              numFramesRead = numBytesRead / bytesPerFrame;                 
              totalFramesRead += numFramesRead;            
        }
    } catch (Exception e) {
        System.out.println("Had problems reading new content");
    }

So first of all, this code is not from me. This is my first time, reading audio-files so I got some help from the inter-webs. (Found the link: Java - reading, manipulating and writing WAV files stackoverflow, who would have known.

The question is, what are the bytes in audioBytes representing? Since the source is a 44kHz, stereo, there have to be 2 waves hiding in there somewhere, am I right? so how do I filter the important informations out of these bytes?

// EDIT

So what I added is this function:

public short[] Get_Sample() {
    if(samplesRead == 1024) {
        Read_Buffer();
        samplesRead = 4;
    } else {
        samplesRead = samplesRead + 4;
    }
    short sample[] = new short[2];
    sample[0] = (short)(audioBytes[samplesRead-4] + 256*audioBytes[samplesRead-3]);
    sample[1] = (short)(audioBytes[samplesRead-2] + 256*audioBytes[samplesRead-1]); 
    return sample;
}

where Read_Buffer() reads the next 1024 (or less) Bytes and loads them into audioBytes. sample[0] is used for the left side, sample[1] for the right side. But I'm still not sure since the waves i get from this look quite "noisy". (Edit: the used WAV actually used little-endian byte order so I had to change the calculation.)

*"so how do I filter the important informations out of these bytes?"* What exact 'important information' do you think a single frame of the audio input stream actually contains? — Andrew Thompson, Jun 28 '13 at 18:02
Tha actual sample. Since it is stereo here have to be 2 valuesfor every sample, am I right? — ruhig brauner, Jun 28 '13 at 18:34
*"Since it is stereo here have to be 2 valuesfor every sample, am I right?"* Yes. But note that if it is 16 bit (typical for 44.1KHz stereo), there will be 4 bytes per frame and 2 bytes per channel. — Andrew Thompson, Jun 28 '13 at 19:05
Well that helps alot. :) So what actually gets copied to the audioBytes-array are 4 bytes per sample where 2 bytes are from the left channel and 2 bytes are from the right? How do I know, wich bytes are from wich side and how do I "combine" then? — ruhig brauner, Jun 28 '13 at 19:15

Knight of Ni · Answer 1 · 2013-06-29T00:52:08.377

2

AudioInputStream read() method returns the raw audio data. You don't know what is the 'construction' of data before you read the audio format with getFormat() which returns AudioFormat. From AudioFormat you can getChannels() and getSampleSizeInBits() and more... This is because the AudioInputStream is made for known format.

If you calculate a sample value you have different possibilities with signes and endianness of the data (in case of 16-bit sample). To make a more generic code use your AudioFormat object returned from AudioInputStream to get more info about the data buffer:

encoding() : PCM_SIGNED, PCM_UNSIGNED ...
bigEndian() : true or false

As you already discovered the incorrect sample building may lead to some disturbed sound. If you work with various files it may case a problems in the future. If you won't provide a support for some formats just check what says AudioFormat and throw exception (e.g. javax.sound.sampled.UnsupportedAudioFileException). It will save your time.

edited Jun 29 '13 at 00:52

answered Jun 28 '13 at 19:22

Knight of Ni

1,780
3
20
47

But it does not return header-data or descriptor-data, right? so it starts with (in this example) 4 bytes for the fist sample of the file and then the second sample with 4 bytes and so on. So everything I need to know is wich bytes belong to wich side,(byte 1&2 for left, 3&4 for right) and how to "combine" the two bites for the sides. (I would just calculate sampe = 256*firstByte + secondByte, right?) – ruhig brauner Jun 28 '13 at 19:36
Maybe this will help: http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html – Bjorn Roche Jun 28 '13 at 19:48
Yes, this helped. ;) I think I got it now. – ruhig brauner Jun 28 '13 at 20:57
Sounds good but performance wise, is there a way to avoid checking the endian and the singed-status every time? (I mean, not that is would help my programm , I don't even work with threads yet.) For C++ I would point to a specific function, depending on the encoding,... Is there a similar way for Java? (Maybe adding a class with differenc subclasses?) – ruhig brauner Jun 29 '13 at 10:10
Yes. You may check the format once and instantiate a different type of class with polymorphism (as you mentioned sub-class) or interface. When you go more through the new language I am sure you will feel the difference... Consider also marking the answer as accepted. – Knight of Ni Jul 15 '13 at 01:59

What exactly does AudioInputStream.read method return?

1 Answers1

Linked