I'm trying to get the audio byte[] that's created when the TextToSpeech engine synthesises text.
I've tried creating a Visualiser and assigned a OnDataCaptureListener
but the byte[] it provides is always the same, and therefore I don't believe the array is connected to the spoken text.
This is my implementation:
AudioManager audioManager = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE);
audioManager.requestAudioFocus(focusChange -> Log.d(TAG, "focusChange is: is: " + focusChange), AudioManager.STREAM_MUSIC, AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK);
int audioSessionId = audioManager.generateAudioSessionId();
mVisualizer = new Visualizer(audioSessionId);
mVisualizer.setEnabled(false);
mVisualizer.setCaptureSize(Visualizer.getCaptureSizeRange()[0]);
mVisualizer.setDataCaptureListener(
new Visualizer.OnDataCaptureListener() {
public void onWaveFormDataCapture(Visualizer visualizer,
byte[] bytes, int samplingRate) {
//here the bytes are always equal to the bytes received in the last call
}
public void onFftDataCapture(Visualizer visualizer, byte[] bytes, int samplingRate) {
}
}, Visualizer.getMaxCaptureRate(), true, true);
mVisualizer.setEnabled(true);
I also found that you can use the SynthesisCallback to receive the byte[] via its audioAvailable()
method but I can't seem to implement it properly.
I created a TextToSpeechService
but its onSynthesizeText()
method is never called. However, I can tell that the service is working as the onLoadLanguage()
is called.
My question in a nutshell: How do I get the audio bytes[] representation of the audio created when the TextToSpeech engine synthesis text?
Thanks in advance.