20

I'm working on an application that has to process audio files. When using mp3 files I'm not sure how to handle data (the data I'm interested in are the the audio bytes, the ones that represent what we hear).

If I'm using a wav file I know I have a 44 bytes header and then the data. When it comes to an mp3, I've read that they are composed by frames, each frame containing a header and audio data. Is it possible to get all the audio data from a mp3 file?

I'm using java (I've added MP3SPI, Jlayer, and Tritonus) and I'm able to get the bytes from the file, but I'm not sure about what these bytes represent or how to handle then.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
dedalo
  • 2,541
  • 12
  • 32
  • 34

3 Answers3

33

From the documentation for MP3SPI:

File file = new File(filename);
AudioInputStream in= AudioSystem.getAudioInputStream(file);
AudioInputStream din = null;
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 
                                            baseFormat.getSampleRate(),
                                            16,
                                            baseFormat.getChannels(),
                                            baseFormat.getChannels() * 2,
                                            baseFormat.getSampleRate(),
                                            false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);

You then just read data from din - it will be the "raw" data as per decodedFormat. (See the docs for AudioFormat for more information.)

(Note that this sample code doesn't close the stream or anything like that - use appropriate try/finally blocks as normal.)

Jasper de Vries
  • 19,370
  • 6
  • 64
  • 102
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Hi Jon, Thanks for your quick answer! In your proposal; is 'decodedFormat' a representation of the mp3 data decoded in other format? if I write "din.read()", am I getting the data bytes in th decoded format? Thanks – dedalo Jun 02 '09 at 09:56
  • Yes. That decodedFormat says "I want you to decode as signed PCM data". – Jon Skeet Jun 02 '09 at 10:13
  • Hi. I followed your advise and it worked. To visualize the data I use: while ((numBytesRead = din.read(audioBytes)) != -1) {} This reads the bytes in 'din' and stores them in the array audioBytes. I've trying visualizinf the data by using: while ((numBytesRead = din.read(audioBytes)) != -1) { System.out.println("Bytes Decoded value " + audioBytes[0]);} I have a question about this data: Every sample uses 16 bits, that's 2 positions in the array audioBytes, rigth? How could I get the value of every sample? Does the decoded format (wav) has the 44 header bytes? Thank yoy very much for your help! – dedalo Jun 03 '09 at 09:16
  • The decoded format here isn't actually wav - it's just the data part. Yes, you'll get one sample per two bytes (and two samples for the same time if it's stereo). Just fetch both bytes and convert each pair into a 16-bit value. Or if you want, you could change the 16 to 8 in the decodedFormat constructor call... – Jon Skeet Jun 03 '09 at 09:35
  • I'm a little confused. If the audio file is stereo does this mean that in the byte array there are 2 bytes for the 1st sample (left channel) and another 2 bytes for the 1st sample (rigth channel)? – dedalo Jun 03 '09 at 19:53
  • Yes. (I'm not sure which order they're in, whether it's left then right or right then left, but that's the basic idea.) – Jon Skeet Jun 03 '09 at 20:06
  • Ok, I've checked the order of the bytes in the case the audio file is stereo. I think this is a problem. After getting the audio data sample array I need to start processing it, which involves applying a Hamming window to N samples and then calculating FFT. I'll keep thinking about it. Thanks! – dedalo Jun 04 '09 at 07:10
  • @dedalo: If you only *want* it to decode to mono, change the decodedFormat constructor call. I don't know whether it will pick one channel or other, or mix the two, but it's worth a try. – Jon Skeet Jun 04 '09 at 07:53
  • This is waht I tried: [AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, baseFormat.getSampleRate(),16, 1, //baseFormat.getChannels(), baseFormat.getChannels() * 2, baseFormat.getSampleRate(), false); But I got an exception I was not able to solve. – dedalo Jun 04 '09 at 08:07
  • You probably need to change the next argument as well (the frame size). If that doesn't work, please say what the exception was. – Jon Skeet Jun 04 '09 at 08:35
  • I tried changing the frame size argument but none of the values I used worked. It looks like it lets me modify the other arguments, but not the one related to stereo/mono. I think the exception is caused by: numBytesRead = din.read(audioBytes). – dedalo Jun 04 '09 at 21:16
  • Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1at javazoom.spi.mpeg.sampled.convert.DecodedMpegAudioInputStream$DMAISObuffer.append(Unknown Source)at javazoom.jl.decoder.Obuffer.appendSamples(Unknown Source)at javazoom.jl.decoder.SynthesisFilter.compute_pcm_samples(Unknown Source)at javazoom.jl.decoder.SynthesisFilter.calculate_pcm_samples(Unknown Source)at javazoom.jl.decoder.LayerIIIDecoder.decode(Unknown Source)at javazoom.jl.decoder.LayerIIIDecoder.decodeFrame(Unknown Source) – dedalo Jun 04 '09 at 21:20
  • at javazoom.jl.decoder.Decoder.decodeFrame(Unknown Source)at javazoom.spi.mpeg.sampled.convert.DecodedMpegAudioInputStream.execute(Unknown Source)at org.tritonus.share.TCircularBuffer.read(TCircularBuffer.java:138)at org.tritonus.share.sampled.convert.TAsynchronousFilteredAudioInputStream.read(TAsynchronousFilteredAudioInputStream.java:189)at org.tritonus.share.sampled.convert.TAsynchronousFilteredAudioInputStream.read(TAsynchronousFilteredAudioInputStream.java:175) – dedalo Jun 04 '09 at 21:21
  • Okay, well you'd probably want to debug into that. You may need to handle the stereo to mono conversion yourself. – Jon Skeet Jun 05 '09 at 05:26
  • one last question:when gettin the decodedFormat we're choosing little endian (big endian = false). If I write 'true', will the data in decodedFormat be stored in big endian format? If so I won't need to manipulate the bytes in order to get a double value type for each sample. – dedalo Jun 17 '09 at 16:47
  • At the end of the process I'll get a number of arrays containing the mel coefficients, is it possible to use a k-mean algorithm? – dedalo Jun 21 '09 at 21:51
  • @dedalo: In terms of big-endianness: Don't know off hand, but I'd expect so. As for k-mean... no idea, not knowing what it is. – Jon Skeet Jun 21 '09 at 22:49
  • 3
    i tried this sample and got this error : javax.sound.sampled.UnsupportedAudioFileException: could not get audio input stream from input file at javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1187) – Anuj Kulkarni Oct 06 '13 at 00:29
  • I had the same problem as @BigShow and checked the documentation [here](https://docs.oracle.com/javase/7/docs/api/javax/sound/sampled/AudioFileFormat.Type.html) and the list of supported formats **does not include mpeg 3**. To fix this is converted my mpeg 3 file to wave and the code worked flawlessly. – Harry Saliba Jan 19 '16 at 20:44
  • @HarrySaliba - if you install the library that's linked in the answer, it *adds mp3 to the supported formats*. – Jules Mar 05 '18 at 09:39
  • 1
    I have installed all mp2spi, tritonus, jlayer but when i call AudioInputStream in = AudioSystem.getAudioInputStream("file.mp3"); in.getFrameLength() always return -1. Any help? – Programmer dude Aug 28 '19 at 07:07
  • 2
    @Programmerdude: Please ask a new question with a [mcve] rather than adding comments to a question that's over 10 years old. – Jon Skeet Aug 28 '19 at 08:01
0

The data that you want are the actual samples, while MP3 represents the data differently. So, like what everyone else has said - you need a library to decode the MP3 data into actual samples for your purpose.

sybreon
  • 3,128
  • 18
  • 19
0

As mentioned in the other answers, you need a decoder to decode MP3 into regular audio samples.

One popular option would be JavaLayer (LGPL).

sleske
  • 81,358
  • 34
  • 189
  • 227