I'm working on a project for a Discord bot, I would like to allow a bot to listen within a Discord channel and process voice commands.
I'm using an open source speech-to-text Java library called Sphinx (https://cmusphinx.github.io/). I'm receiving audio data from the Discord server via this https://github.com/DV8FromTheWorld/JDA library.
This class (https://github.com/DV8FromTheWorld/JDA/blob/master/src/main/java/net/dv8tion/jda/core/audio/AudioReceiveHandler.java#L65) is used for receiving audio.
Method handleCombinedAudio(CombinedAudio audio)
is called every 20 ms
, and a byte[]
of the audio data can be retrieved with audio.getBytes[]
.
The voice recognition software requires an InputStream
of a byte
array to properly recognize data. I have a method that concatenates byte
arrays to form 3 sec
chunks of sound, each which is processed by the voice recognition software. The problem I've run into is a mismatch of sound formats.
Sphinx requires RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
Discord returns audio in: 48KHz 16bit stereo signed BigEndian PCM
How do I convert the received byte[]
array from Discord into the proper format for Sphinx?
Any ideas would be greatly appreciated. Please be specific in answers.