3

I'm trying to get the audio byte[] that's created when the TextToSpeech engine synthesises text.

I've tried creating a Visualiser and assigned a OnDataCaptureListener but the byte[] it provides is always the same, and therefore I don't believe the array is connected to the spoken text.

This is my implementation:

            AudioManager audioManager = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE);

            audioManager.requestAudioFocus(focusChange -> Log.d(TAG, "focusChange is: is: " + focusChange), AudioManager.STREAM_MUSIC, AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK);

            int audioSessionId = audioManager.generateAudioSessionId();

            mVisualizer = new Visualizer(audioSessionId);

            mVisualizer.setEnabled(false);

            mVisualizer.setCaptureSize(Visualizer.getCaptureSizeRange()[0]);

            mVisualizer.setDataCaptureListener(
            new Visualizer.OnDataCaptureListener() {

            public void onWaveFormDataCapture(Visualizer visualizer,
             byte[] bytes, int samplingRate) {

                   //here the bytes are always equal to the bytes received in the last call
              }

              public void onFftDataCapture(Visualizer visualizer, byte[] bytes, int samplingRate) {

               }
             }, Visualizer.getMaxCaptureRate(), true, true);

             mVisualizer.setEnabled(true);

I also found that you can use the SynthesisCallback to receive the byte[] via its audioAvailable() method but I can't seem to implement it properly.

I created a TextToSpeechService but its onSynthesizeText() method is never called. However, I can tell that the service is working as the onLoadLanguage() is called.

My question in a nutshell: How do I get the audio bytes[] representation of the audio created when the TextToSpeech engine synthesis text?

Thanks in advance.

Micah Simmons
  • 2,078
  • 2
  • 20
  • 38
  • I could only find way that firstly stnthesizes the tts to a file and load the file again to the target buffer using wav reading library. – KYHSGeekCode Apr 13 '18 at 09:18

1 Answers1

2

I heard that onAudioAvailable() was deprecated and my callback is not called, too.

So a workaround is:

  1. In Activity:

    try
        {
            tts.shutdown();
            tts = null;
        }
        catch (Exception e)
        {}
     tts = new TextToSpeech(this, this);
    
  2. In OnInit() method:

       @Override
        public void onInit(int p1)
        {
    
         HashMap<String,String> mTTSMap = new HashMap<String,String>();
        tts.setOnUtteranceProgressListener(new UtteranceProgressListener()
                { 
                    @Override
                    public void onStart(final String p1)
                    {
                        // TODO: Implement this method
                            Log.e(TAG, "START");
                    }
    
                    @Override
                    public void onDone(final String p1)
                    {
                        if (p1.compareTo("abcde") == 0)
                        {
                            synchronized (MainActivity.this)
                            {
                                MainActivity.this.notifyAll();
                            }
                        }
                    }
    
                    @Override
                    public void onError(final String p1)
                    {       
                         //this is also deprecated...
                    }
                    @Override
                    public void onAudioAvailable(final String id, final byte[] bytes)
                    {
              //never calked!
                        runOnUiThread(new Runnable(){
    
                                @Override
                                public void run()
                                {
                                    // TODO: Implement this method
                                    Toast.makeText(MainActivity.this, "id:" + id  /*"bytes:" + Arrays.toString(bytes)*/, 1).show();
                                    Log.v(TAG, "BYTES");
                                }});
                        //super.onAudioAvailable(id,bytes);
    
                    }
    
                });
    
            Locale enEn = new Locale("en_EN");
            if (tts.isLanguageAvailable(enEn) == TextToSpeech.LANG_AVAILABLE)
            {
                tts.setLanguage(enEn);
            }
    
            /*public int synthesizeToFile(java.lang.CharSequence text, android.os.Bundle params, java.io.File file, java.lang.String utteranceId);*/
            //@java.lang.Deprecated()
            // public int synthesizeToFile(java.lang.String text, java.util.HashMap<java.lang.String, java.lang.String> params, java.lang.String filename);
                  mTTSMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "abcde"); tts.synthesizeToFile("Hello",mTTSMap,"/storage/emulated/0/a.wav");
    synchronized(MainActivity.this){
                     try{
                          MainActivity.this.wait();
                         }catch(InterruptedException e){}
                         ReadTheFile();    
                     }
            }
    

Then your work is to load the a.wav to the buffer you want. Using libraries like that was mentioned in this SO answer.

Summary:

  1. Create TTS Engine.
  2. Initialize it.
  3. OnInit is called.
  4. In OnInit(), you setup a new HashMap and put utterence id.
  5. Register setOnUtteranceProgressListener.
  6. Synthesize something to a file.
  7. Call wait();
  8. In onDone() method call notify();
  9. After the wait(); read the synthesized file to a buffer.
KYHSGeekCode
  • 1,068
  • 2
  • 12
  • 30