I want to play the output of android's TextToSpeech.synthesizeToFile using AudioTrack but fear I'm passing AudioTrack's builders the wrong parameters. I copied one of the files generated with TextToSpeech.synthesizeToFile using adb and put in in this github file if you want to see TextToSpeech.synthesizeToFile's output for yourself. When I run play tempSoundFile8290688667049742717.wav
in linux, the file plays the text I wrote (hello world), and prints the following:
play WARN alsa: can't encode 0-bit Unknown or not applicable
tempSoundFile8290688667049742717.wav:
File Size: 39.4k Bit Rate: 353k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 22050Hz
Replaygain: off
Duration: 00:00:00.89
In:100% 00:00:00.89 [00:00:00.00] Out:19.7k [!=====|=====!] Clip:0
Done.
accordingly, I'm setting AudioTrack's parameters as follows:
private AudioDeviceInfo findAudioDevice(int deviceFlag, int deviceType) {
AudioManager manager = (AudioManager) this.context.getSystemService(Context.AUDIO_SERVICE);
AudioDeviceInfo[] adis = manager.getDevices(deviceFlag);
for (AudioDeviceInfo adi : adis) {
if (adi.getType() == deviceType) {
return adi;
}
}
return null;
}
AudioDeviceInfo mAudioOutputDevice = findAudioDevice(AudioManager.GET_DEVICES_OUTPUTS,
AudioDeviceInfo.TYPE_BUS);
AudioAttributes.Builder audioAttributesBuilder = new AudioAttributes.Builder().
setUsage(AudioAttributes.USAGE_VOICE_COMMUNICATION_SIGNALLING).
setContentType(AudioAttributes.CONTENT_TYPE_SPEECH).
setFlags(AudioAttributes.FLAG_AUDIBILITY_ENFORCED);
attributes = audioAttributesBuilder.build();
int minBufferSize = AudioTrack.getMinBufferSize(22050, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);
AudioTrack.Builder atBuilder = new AudioTrack.Builder();
//builder.setAudioAttributes()
AudioFormat.Builder afBuilder = new AudioFormat.Builder();
afBuilder.setEncoding(AudioFormat.ENCODING_PCM_16BIT)
.setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
.setSampleRate(22050);
atBuilder.setAudioFormat(afBuilder.build())
.setTransferMode(AudioTrack.MODE_STREAM)
.setBufferSizeInBytes(minBufferSize)
.setAudioAttributes(attributes);
at = atBuilder.build();
at.setPreferredDevice(mAudioOutputDevice);
File myFile = File.createTempFile("tempSoundFile", ".wav");
myFile.deleteOnExit();
myFile.setWritable(true);
myFile.setReadable(true);
and later plays the file, using code from here:
/**
* Code taken from here:
* https://stackoverflow.com/questions/7372813/android-audiotrack-playing-wav-file-getting-only-white-noise
*/
private void playWav(){
Log.d(TAG, "Playing speech to text wav file");
String filepath = this.myFile.getAbsolutePath();
int i = 0;
int BUFFER_SIZE = 512;
byte[] s = new byte[BUFFER_SIZE];
try {
Log.i(TAG, "file path is: " + filepath);
FileInputStream fin = new FileInputStream(filepath);
DataInputStream dis = new DataInputStream(fin);
at.play();
while((i = dis.read(s, 0, BUFFER_SIZE)) > -1){
at.write(s, 0, i);
Log.v(TAG, Arrays.toString(s));
}
at.stop();
at.release();
dis.close();
fin.close();
} catch (FileNotFoundException e) {
// TODO
e.printStackTrace();
} catch (IOException e) {
// TODO
e.printStackTrace();
}
}
Of course, calls to these are spread across different async calls as you can see in my code, but I've debugged all that with log statements and the debugger and don't see any issues. playWav() is getting hit when I expect, but not playing anything.
edit:
My primary motivation for using AudioTrack is to make it TextToSpeech compatible with the raspberry pi voice kit's android things library. Using AudioTrack will allow me to play textToSpeech over I2S (or any speaker I choose).
edit 2, a deeper look:
According to this website, wav files have a 44 byte header that tells what all these parameters are. In this header, at:
- position 20, 2 bytes that dictate the file type (little endian) (16 for PCM)
- position 22, 2 bytes that dictate the number of channels (1 for Mono, 2 for Stereo) (little endian)
- position 24, 4 bytes that dictate the sample rate (little endian)
- and finally at position 34, 2 bytes dictate the bits per sample (little endian)
here is a hex dump of the above mentioned file:
$ hd -n 44 tempSoundFile8290688667049742717.wav 00000000 52 49 46 46 f8 99 00 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt | 00000010 10 00 00 00 01 00 01 00 22 56 00 00 44 ac 00 00 |........"V..D...| 00000020 02 00 10 00 64 61 74 61 d4 99 00 00 |....data....| 0000002c