So I'm currently trying to take audio from an external microphone (that's actually on a robot in this case) and stream it into Unity to be played in a scene. I'm fairly certain this audio is encoded in the mp3 format with a sample rate of 16000 Hz and a bitrate of 192 kHz.
I'm able to get this audio as a byte array (that seems to always be Little Endian) in Unity, and I'd like to convert to a float array with each value ranging from -1.0f to +1.0f so that I can use AudioClip.SetData to play it in the Unity scene. My problem is that I'm so far unable to do this.
My first attempt was based on this StackOverflow answer: create AudioClip from byte[] which uses the following function for conversion:
private float[] ConvertByteToFloat(byte[] array) {
float[] floatArr = new float[array.Length / 4];
for (int i = 0; i < floatArr.Length; i++) {
if (BitConverter.IsLittleEndian) {
Array.Reverse(array, i * 4, 4);
}
floatArr[i] = BitConverter.ToSingle(array, i * 4) / 0x80000000;
}
return floatArr;
}
I then invoked this like so:
scaledAudio = ConvertByteToFloat(audioData);
AudioClip audioClip = AudioClip.Create("RobotAudio", scaledAudio.Length, 1, 16000, false);
audioClip.SetData(scaledAudio, 0);
AudioSource.PlayClipAtPoint(audioClip, robot.transform.position);
But the result was a lot of static, and on logging some outputs, I realized that I was getting a bunch of NaN's...
I read somewhere that mp3 audio could extracted using the BitConverter.ToInt16()
function, so I changed the ConvertByteToFloat
function accordingly like so:
private float[] ConvertByteToFloat16(byte[] array) {
float[] floatArr = new float[array.Length / 2];
for (int i = 0; i < floatArr.Length; i++) {
if (BitConverter.IsLittleEndian) {
Array.Reverse(array, i * 2, 2);
}
floatArr[i] = (float) (BitConverter.ToInt16(array, i * 2) / 32767f);
}
return floatArr;
}
[Note: the result is divided by 32767f because I read this is the maximum value that can occur and I want to scale it down to between -1.0f and 1.0f]
The numbers from this look much more promising. They are indeed all between -1.0f and 1.0f. But when I attempt to play the audio with Unity, all I hear is static.
The issue almost definitely seems to be in the conversion of the byte[] to the float[], but I could've made a mistake in setting the data or the player for the AudioClip or the AudioSource.
Any help/suggestions are MUCH appreciated!
[Additional resources: The byte[] that I got into unity comes from here: https://github.com/ros-drivers/audio_common/blob/master/audio_capture/src/audio_capture.cpp There is a related script that takes the data encoded by this capture program and plays it (https://github.com/ros-drivers/audio_common/blob/master/audio_play/src/audio_play.cpp). This works just fine - so if I could replicate the decoding functionality of the audio_play script in that second link, it seems like I'll be good to go!]