How do you get the audio byte[] from the synthesised speech created by the TextToSpeech engine?

Question

I'm trying to get the audio byte[] that's created when the TextToSpeech engine synthesises text.

I've tried creating a Visualiser and assigned a OnDataCaptureListener but the byte[] it provides is always the same, and therefore I don't believe the array is connected to the spoken text.

This is my implementation:

            AudioManager audioManager = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE);

            audioManager.requestAudioFocus(focusChange -> Log.d(TAG, "focusChange is: is: " + focusChange), AudioManager.STREAM_MUSIC, AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK);

            int audioSessionId = audioManager.generateAudioSessionId();

            mVisualizer = new Visualizer(audioSessionId);

            mVisualizer.setEnabled(false);

            mVisualizer.setCaptureSize(Visualizer.getCaptureSizeRange()[0]);

            mVisualizer.setDataCaptureListener(
            new Visualizer.OnDataCaptureListener() {

            public void onWaveFormDataCapture(Visualizer visualizer,
             byte[] bytes, int samplingRate) {

                   //here the bytes are always equal to the bytes received in the last call
              }

              public void onFftDataCapture(Visualizer visualizer, byte[] bytes, int samplingRate) {

               }
             }, Visualizer.getMaxCaptureRate(), true, true);

             mVisualizer.setEnabled(true);

I also found that you can use the SynthesisCallback to receive the byte[] via its audioAvailable() method but I can't seem to implement it properly.

I created a TextToSpeechService but its onSynthesizeText() method is never called. However, I can tell that the service is working as the onLoadLanguage() is called.

My question in a nutshell: How do I get the audio bytes[] representation of the audio created when the TextToSpeech engine synthesis text?

Thanks in advance.

I could only find way that firstly stnthesizes the tts to a file and load the file again to the target buffer using wav reading library. — KYHSGeekCode, Apr 13 '18 at 09:18

KYHSGeekCode · Answer 1 · 2018-04-13T10:04:55.877

I heard that onAudioAvailable() was deprecated and my callback is not called, too.

So a workaround is:

In Activity:

try
    {
        tts.shutdown();
        tts = null;
    }
    catch (Exception e)
    {}
 tts = new TextToSpeech(this, this);

In OnInit() method:

   @Override
    public void onInit(int p1)
    {

     HashMap<String,String> mTTSMap = new HashMap<String,String>();
    tts.setOnUtteranceProgressListener(new UtteranceProgressListener()
            { 
                @Override
                public void onStart(final String p1)
                {
                    // TODO: Implement this method
                        Log.e(TAG, "START");
                }

                @Override
                public void onDone(final String p1)
                {
                    if (p1.compareTo("abcde") == 0)
                    {
                        synchronized (MainActivity.this)
                        {
                            MainActivity.this.notifyAll();
                        }
                    }
                }

                @Override
                public void onError(final String p1)
                {       
                     //this is also deprecated...
                }
                @Override
                public void onAudioAvailable(final String id, final byte[] bytes)
                {
          //never calked!
                    runOnUiThread(new Runnable(){

                            @Override
                            public void run()
                            {
                                // TODO: Implement this method
                                Toast.makeText(MainActivity.this, "id:" + id  /*"bytes:" + Arrays.toString(bytes)*/, 1).show();
                                Log.v(TAG, "BYTES");
                            }});
                    //super.onAudioAvailable(id,bytes);

                }

            });

        Locale enEn = new Locale("en_EN");
        if (tts.isLanguageAvailable(enEn) == TextToSpeech.LANG_AVAILABLE)
        {
            tts.setLanguage(enEn);
        }

        /*public int synthesizeToFile(java.lang.CharSequence text, android.os.Bundle params, java.io.File file, java.lang.String utteranceId);*/
        //@java.lang.Deprecated()
        // public int synthesizeToFile(java.lang.String text, java.util.HashMap<java.lang.String, java.lang.String> params, java.lang.String filename);
              mTTSMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "abcde"); tts.synthesizeToFile("Hello",mTTSMap,"/storage/emulated/0/a.wav");
synchronized(MainActivity.this){
                 try{
                      MainActivity.this.wait();
                     }catch(InterruptedException e){}
                     ReadTheFile();    
                 }
        }

Then your work is to load the a.wav to the buffer you want. Using libraries like that was mentioned in this SO answer.

Summary:

Create TTS Engine.
Initialize it.
OnInit is called.
In OnInit(), you setup a new HashMap and put utterence id.
Register setOnUtteranceProgressListener.
Synthesize something to a file.
Call wait();
In onDone() method call notify();
After the wait(); read the synthesized file to a buffer.

How do you get the audio byte[] from the synthesised speech created by the TextToSpeech engine?

1 Answers1