8

I have a big batch of files I'd like to run recognition on using CMU Sphinx 4. Sphinx requires the following format:

  • 16 khz
  • 16 bit
  • mono
  • little-endian

My files are something like 44100 khz, 32 bit stereo mp3 files. I tried using Tritonus, and then its updated version JavaZoom, to convert using code from bakuzen. However, AudioSystem.getAudioInputStream(File) throws an UnsupportedAudioFileException, and I haven't been able to figure out why, so I have moved on.

Now I am trying ffmpeg. The command ffmpeg -i input.mp3 -ac 1 -ab 16 -ar 16000 output.wav seems like it should do the trick (except for little endian), but when I check the output with Audacity, it still labels it as "32-bit float". The command I found on this site also uses -acodec pcm_s16le, which from its name seems to be outputting 16 bit little endian; however, Audacity still tells me the output is 32 bit float.

Can anyone tell me how to convert audio files into the format required by CMU Sphinx 4?

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
Nate Glenn
  • 6,455
  • 8
  • 52
  • 95

1 Answers1

21

Did you actually try the output from ffmpeg in CMU Sphinx 4? 32-bit float is probably your default sampling format in Audacity (Edit > Preferences > Quality). I'm guessing it converts any imported file to these settings, so it may not be reporting the parameters of the actual file, but perhaps the working file in Audacity.

Remove -ab 16. This would instruct the encoder to use 16 bits/s and ffmpeg will ignore it for pcm_s16le anyway. So your command will look like:

ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav

To convert all mp3 files in a directory in Linux:

for f in *.mp3; do ffmpeg -i "$f" -acodec pcm_s16le -ac 1 -ar 16000 "${f%.mp3}.wav"; done

Or Windows:

for /r %i in (*) do ffmpeg -i %i -acodec pcm_s16le -ac 1 -ar 16000 %i.wav

In Windows Batch file:

for /r %%i in (*.mp3) do ffmpeg -i "%%i" -acodec pcm_s16le -ac 1 -ar 16000 "%i.wav"

You can see file information with file, ffmpeg, ffprobe, mediainfo among other utilities:

$ file hjl0bC.wav 
hjl0bC.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

$ ffmpeg -i hjl0bC.wav
[...]
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
llogan
  • 121,796
  • 28
  • 232
  • 243
  • Thank you, this appears to be the correct format. My output files still do not run with Sphinx 4, however. May have to ask @Nikolay Shmyrev directly... – Nate Glenn Dec 04 '12 at 03:48
  • The format was right. My file just had zero energy level regions, so once I added dither into the frontend everything worked great. – Nate Glenn Dec 04 '12 at 20:55
  • @NateGlenn I added your edit that was rejected by other users. I'm not a Windows user, so I didn't test it. – llogan Dec 04 '12 at 21:23
  • Thanks. I guess if my edits are being rejected that I need to review editing policy. – Nate Glenn Dec 04 '12 at 22:20