Bypassing Google TTS Engine initialization lag in Android

Question

I have tried playing the TextToSpeech object when a specific event is triggered in the phone.

However, I facing issues with the default Google TTS engine that is installed on most phones. As of now, I am playing some text immediately after the TextToSpeech object is initialized, and shutting the resource as soon as the speech is completed, as per the following code:

public class VoiceGenerator {
private Context context = null;

private static TextToSpeech voice = null;

public VoiceGenerator(Context context)
{
    this.context = context;
}


public void voiceInit(String text)
{
    try {
        if (voice == null) {

            new Thread(new Runnable() {
                @Override
                public void run() {
                    voice = new TextToSpeech(context, new TextToSpeech.OnInitListener() {
                        @Override
                        public void onInit(final int status) {
                            try {
                                if (status != TextToSpeech.ERROR) {
                                    voice.setLanguage(Locale.US);
                                    Log.d("VoiceTTS", "TTS being initialized");
                                    HashMap p = new HashMap<String, String>();
                                    p.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "ThisUtterance");

 //Speaking here
                           voice.speak(text, TextToSpeech.QUEUE_ADD, p);

                                    voice.setOnUtteranceProgressListener(new UtteranceProgressListener() {
                                        @Override
                                        public void onStart(String utteranceId) {

                                        }

                                        @Override
                                        public void onDone(String utteranceId) {
                                            Log.d("VoiceTTS", "TTS being released");
                                            clearTtsEngine();
                                        }

                                        @Override
                                        public void onError(String utteranceId) {

                                        }
                                    });
                                }

                            } catch (Exception e) {
                                clearTtsEngine();
                                Log.d("ErrorLog", "Error occurred while voice play");
                                e.printStackTrace();
                            }


                        }
                    });
                }
            }).start();

        }
    }
    catch(Exception e)
    {
        clearTtsEngine();
        Log.d("ErrorLog","Error occurred while voice play");
        e.printStackTrace();
    }
}

public static void clearTtsEngine()
{
    if(voice!=null)
    {
        voice.stop();
        voice.shutdown();
        voice = null;
    }



 }
}

However, the problem I am facing is the finite amount of delay associated with initializing the Google TTS Engine - about 6-8 seconds on my devices.

I have read on other posts that this delay can be avoided by using other TTS engines. Since I always develop on my Samsung phone, which has its own proprietary TTS configured by default, I never noticed this issue until I checked my app on other brand phones which has the Google TTS engine configured as default. But, I ideally don't want to force users to install another app along with my own, and I hence I would like this to work with the default Google TTS Engine itself.

Through some erroneous coding which I later rectified, I realized that if I could keep the TextToSpeech object initialized beforehand and always not null - once initialized, I could seemingly bypass this delay.

However, since there is a necessity to shutdown the resource once we are done with it, I am not able to keep the object alive and initialized for long, and I do not know when to initialize/shutdown the resource, since I technically need the voice to play anytime the specific event occurs, which mostly would be when my app is not open on the phone.

So my questions are the following :

Can we somehow reduce or eliminate the initialization delay of Google TTS Engine, programmatically or otherwise?
Is there any way through which I can keep the TextToSpeech object alive and initialized at all times like say, through a service? Or would this be a bad, resource-consuming design?
Also is using a static TextToSpeech object the right way to go, for my requirements?

Any solutions along with code would be appreciated.

Update: I have confirmed that the delay is associated exclusively with Google TTS engine, as I have tried using other free and paid TTS engines, wherein there is little or no lag. But I would still prefer to not have any third party dependencies, if possible, and would like to make this work with Google TTS Engine.

UPDATE: I have seemingly bypassed this issue by binding this TTS object to a service and accessing it from the service. The service is STICKY (if the service terminates due to memory issue, Android OS will restart the service when memory is available again) and is configured to restart on reboot of the device.

The service only initializes the TTS object and does no other work. I am not explicitly stopping the service, allowing it to run as long as possible. I have defined the TTS object as a static, so that I can access it from other classes of my app.

Although this seems to be working amazingly well, I am concerned if this could lead to memory or battery issues (in my specific situation where service handles only object initialization and then remains dormant). Is there any problem in my design, or can any further improvements/checks be done for my design?

Manifest file :

<uses-permission android:name="android.permission.RECEIVE_BOOT_COMPLETED"/>


<application
    android:allowBackup="false"
    android:icon="@drawable/ic_launcher"
    android:label="@string/app_name" >
    <activity
        android:name="activity.MainActivity"
        android:label="@string/app_name"
        android:screenOrientation="portrait" >
        <intent-filter>
            <action android:name="android.intent.action.MAIN" />

            <category android:name="android.intent.category.LAUNCHER" />
        </intent-filter>
    </activity>

    <receiver
        android:name="services.BroadcastReceiverOnBootComplete"
        android:enabled="true"
        android:exported="false">
        <intent-filter>
            <action android:name="android.intent.action.BOOT_COMPLETED" />
        </intent-filter>
        <intent-filter>
            <action android:name="android.intent.action.PACKAGE_REPLACED" />
            <data android:scheme="package" />
        </intent-filter>
        <intent-filter>
            <action android:name="android.intent.action.PACKAGE_ADDED" />
            <data android:scheme="package" />
        </intent-filter>
    </receiver>


    <service android:name="services.TTSService"></service>

BroadcastReceiver code :

public class BroadcastReceiverOnBootComplete extends BroadcastReceiver {

@Override
public void onReceive(Context context, Intent intent) {
    if (intent.getAction().equalsIgnoreCase(Intent.ACTION_BOOT_COMPLETED)) {
        Intent serviceIntent = new Intent(context, TTSService.class);
        context.startService(serviceIntent);
    }
}

}

TTSService code:

public class TTSService extends Service {

private static TextToSpeech voice =null;

public static TextToSpeech getVoice() {
    return voice;
}

@Nullable
@Override

public IBinder onBind(Intent intent) {
    // not supporting binding
    return null;
}

public TTSService() {
}

@Override
public int onStartCommand(Intent intent, int flags, int startId) {

    try{
        Log.d("TTSService","Text-to-speech object initializing");

        voice = new TextToSpeech(TTSService.this,new TextToSpeech.OnInitListener() {
            @Override
            public void onInit(final int status) {
                Log.d("TTSService","Text-to-speech object initialization complete");                   

            }
            });

    }
    catch(Exception e){
        e.printStackTrace();
    }


    return Service.START_STICKY;
}

@Override
public void onDestroy() {
    clearTtsEngine();
    super.onDestroy();

}

public static void clearTtsEngine()
{
    if(voice!=null)
    {
        voice.stop();
        voice.shutdown();
        voice = null;
    }



}
}

Modified VoiceGenerator code:

public class VoiceGenerator {

private TextToSpeech voice = null;

public VoiceGenerator(Context context)
{
    this.context = context;
}


public void voiceInit(String text)
{
   try {
        if (voice == null) {

            new Thread(new Runnable() {
                @Override
                public void run() {

                    voice = TTSService.getVoice();
                    if(voice==null)
                        return;

                    voice.setLanguage(Locale.US);
                    HashMap p = new HashMap<String, String>();
                    p.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "ThisUtterance");
                    voice.speak(text, TextToSpeech.QUEUE_ADD, p);

                    voice.setOnUtteranceProgressListener(new UtteranceProgressListener() {
                        @Override
                        public void onStart(String utteranceId) {

                        }

                        @Override
                        public void onDone(String utteranceId) {
                        }

                        @Override
                        public void onError(String utteranceId) {

                        }
                    });
                }
            }).start();

        }
    }
    catch(Exception e)
    {
        Log.d("ErrorLog","Error occurred while voice play");
        e.printStackTrace();
    }
}




}

have you checked this one https://github.com/GoogleCloudPlatform/android-docs-samples — Mikhail Kim, Mar 01 '17 at 19:36
I'm keeping a `TextToSpeech` instance alive in a `Service` and can't say that I've seen any problem with that. There's no need to release the resource when ever you finish speaking something. — Markus Kauppinen, Mar 03 '17 at 08:50
Then it would give 'tts instance has leaked out' exception or something like that — SoulRayder, Mar 03 '17 at 08:56
You might have a problem with the `Context`. You should give `TextToSpeech` a `Context` that's valid until you eventually call `shutdown()` on TTS. So in a `Service` the keyword `this` would give you a context that's valid for the life time of the service and you could shutdown TTS in the service's `onDestroy()`. — Markus Kauppinen, Mar 03 '17 at 11:13
True...but point is..i can't keep the service alive indefinitely can i? That is bound to consume lot of power and cause issues... and as soon as service terminates...the texttospeech object terminates as well — SoulRayder, Mar 03 '17 at 11:16
A service that does nothing most of the time won't consume much power unless it prevents the device from going to sleep (i.e. holds a wakelock). And a service can be [made to restart itself](https://stackoverflow.com/questions/9093271/start-sticky-and-start-not-sticky) if it gets forcefully shut down by the Android system. — Markus Kauppinen, Mar 03 '17 at 13:00
@MarkusKauppinen Will try what you suggested, in the meanwhile, can you double-check if this is working with Google TTS Engine, or if you have another TTS installed on your device and reply here? — SoulRayder, Mar 03 '17 at 15:08
Does the TTS engine only need to speak when your app (or an Activity of your app) is in the foreground? — brandall, Mar 03 '17 at 19:42
@brandall Nope, it should be able to speak anytime the specific event occurs, most likely when app activity is not in foreground. — SoulRayder, Mar 03 '17 at 19:43
When an event is triggered, where do you currently initialise the TTS object? Does the event trigger a short-lived Activity, or from a background thread inside an IntentService or BroadcastReceiver etc? — brandall, Mar 03 '17 at 19:48
The event triggers a listener service, and I do not know where and how to initialize and maintain my TTS object for this scenario described in my question — SoulRayder, Mar 03 '17 at 19:51
One more question... Are you requesting their network synthesised voice? — brandall, Mar 03 '17 at 20:01
I am using the system pre-installed or installed Google TTS app.I am not explicitly requesting for a voice over network. — SoulRayder, Mar 03 '17 at 20:03

brandall · Accepted Answer · 2017-10-22T17:36:11.010

I'm the developer of the Android application Saiy. That isn't a shameless plug, it's to demonstrate that I use the design pattern you are considering and I've 'been through' what has prompted your question.

It's fresh in my mind, as I've spent the last year rewriting my code and had to give great consideration to the surrounding issue.

Can we somehow reduce or eliminate the initialization delay of Google TTS Engine, programmatically or otherwise?

I asked a similar question some time ago and initialising the Text to Speech object on a background thread where it is not competing with other tasks, can reduce the delay slightly (as I see you are already doing in your posted code).

You can also make sure that the request to speak is not being delayed further by selecting an embedded voice, rather than one dependent on a network:

In API 21+ check out the options on the Voice class. Particularly getFeatures() where you can examine the latency and requirement for a network.

In API <21 - Set the KEY_FEATURE_NETWORK_SYNTHESIS to false inside your parameters.

Regardless of the above, the Google TTS Engine has the longest initialisation time of any of the engines I've tested (all of them I think). I believe this is simply because they are using all available resources on the device to deliver the highest quality voice they can.

From my own personal testing, this delay is directly proportional to the hardware of the device. The more RAM and performant the processor, the less the initialisation time. The same came be said for the current state of the device - I think you'll find that after a reboot, where there is free memory and Android will not need to kill other processes, the initialisation time will be reduced.

In summary, other than the above mentioned, no, you cannot reduce the initialisation time.

Is there any way through which I can keep the TextToSpeech object alive and initialized at all times like say, through a service? Or would this be a bad, resource-consuming design?
Also is using a static TextToSpeech object the right way to go, for my requirements?

As you've noted, a way to avoid the initialisation time, is to remain bound to the engine. But, there are further problems that you may wish to consider before doing this.

If the device is in a state where it needs to free up resources, which is the same state that causes an extended initialisation delay, Android is well within its rights to garbage collect this binding. If you hold this binding in a background service, the service can be killed, putting you back to square one.

Additionally, if you remain bound to the engine, your users will see the collective memory usage in the Android running application settings. For the many, many users who incorrectly consider (dormant) memory usage directly proportional to battery drain, from my experience, this will cause uninstalls and poor app ratings.

At the time of writing, Google TTS is bound to my app at a cost of 70mb.

If you still want to proceed on this basis, you can attempt to get Android to prioritise your process and kill it last - You'd do this by using a Foreground Service. This opens another can of worms though, which I won't go into.

Effectively, binding to the engine in a service and checking that service is running when you want the engine to speak, is a 'singleton pattern'. Making the engine static within this service would serve no purpose that I can think of.

You can see here how I begin to handle TTS initialisation and the associated problems that can occur - lag included.

Finally, to share my experience as to how I've dealt with the above.

I have 'Google is slow to initialise' at the top of my 'known bugs' and 'FAQ' in the application.

I monitor the time it takes for the engine to call onInit. If it's taking too long, I raise a notification to the user and direct them to the FAQ, where they are gently advised to try another TTS engine.

I run a background timer, that releases the engine after a period of inactivity. This amount of time is configurable by the user and comes with initialisation delay warnings...

I know the above doesn't solve your problems, but perhaps my suggestions will pacify your users, which is a distant second to solving the problem, but hey...

I've no doubt Google will gradually increase the initialisation performance - Four years ago, I was having this problem with IVONA, who eventually did a good job on their initialisation time.

Thanks a ton for your thourough answer.. its a shame really..considering that i just incorporated a sticky service with a boot time reinitializing broadast receiver to combat this problem...and has so far worked wonders... I had initally done something along the lines of your final solution (suggesting other TTS engines to the users)... so i have some further questions...will my approach of having sticky service handling *only* the initialization part the tts engine cause the high memory usage...since..it is only used on the occurence of the event at certain discrete time instants in the day — SoulRayder, Mar 04 '17 at 02:46
*the service-initialized TTS object would be used only at certain discrete time instants on occurrence of the said event — SoulRayder, Mar 04 '17 at 03:26
@SoulRayder so long as any part of your application has a binding to another process, it will show as a collective total in the Android running applications. Inside those settings, it shows the memory breakdown. You should be able to view this now? The only real negative to this, is the perception of your users - Unless of course a user's device has poor hardware - and memory allocation of this size is therefore 'proportionally too high'. — brandall, Mar 04 '17 at 03:29
In your experience, within how many days does this "high memory usage" error manifest? Currently I am running my app with the service based design for over 10 hours.. and the app does not even figure into the high-memory (RAM) consuming apps in my device's background app list — SoulRayder, Mar 04 '17 at 03:32
@SoulRayder it's constant. If your application is indeed bound to the application, it should show up? Are you certain it is currently bound and hasn't been killed - and the service is still running? — brandall, Mar 04 '17 at 03:34
In my service, all I am doing is initializing the TTS object .. and I have defined it as a static object so that I can access the object from my other classes though the service.. and the service is configured to restart at boot time using a broadcast receiver, or in case of service termination .. through STICKY intent. Would this design be prone to that problem? Or any issues in this design? — SoulRayder, Mar 04 '17 at 03:34
@SoulRayder the design pattern is fine - but it doesn't explain why it's not showing up in your memory usage... Do you access it statically, directly via `MyService.ttsObject.speak(etc)`? — brandall, Mar 04 '17 at 03:39
@SoulRayder From your comments and edit, I can only suggest you set up a counter for how many times your service is started and stopped during normal device usage. Increment it in the shared preferences or something. If Android is continually starting and stopping the service it could potentially cause battery drain as you will be initialising the engine every time. I need sleep. Will check back tomorrow. — brandall, Mar 04 '17 at 03:55
Sure thanks a ton for your inputs... I will post my updated code here as well.. Please let me know your inputs on that as well tomorrow :) — SoulRayder, Mar 04 '17 at 04:16
I have posted my updated code. I also tried what you asked regarding the logging.. will update that result tomorrow. Please let me know if any potential issues in my updated code.. so far this has been working well in Samsung, Moto and Lenovo devices on normal operation and even after multiple device restarts, and no battery/memory issues so far — SoulRayder, Mar 04 '17 at 15:22
@SoulRayder Thanks for the bounty. There are a few issues with your code. But, before I go into them, run the logging for 24 hours - use resource intensive apps, such as Facebook, Chrome and YouTube and see what happens to your service (add more logging for when something is null). Happy to take this to chat tomorrow. — brandall, Mar 04 '17 at 17:46
I am travelling and will get back to u in a few days after trying this out in normal operation... will start a chat with u to update my findings...again..thanks for the guidance — SoulRayder, Mar 04 '17 at 17:52
@brandall Can you please check this https://stackoverflow.com/questions/45295158/texttospeech-speak-taking-a-long-time-for-speakout-in-initial-case-how-can-it-b . — Sunil Sunny, Jul 25 '17 at 06:14

Ziad H. · Answer 2 · 2019-09-05T01:16:59.717

well, guys! I think I found the proper code to do this without any delay..

You can initialize TextToSpeach in onCreate() like this:

TextToSpeach textToSpeech = new TextToSpeech(this, this);

but first you need to implement TextToSpeech.OnInitListener, and then you need to override the onInit() method:

@Override
public void onInit(int status) {

    if (status == TextToSpeech.SUCCESS) {
        int result = tts.setLanguage(Locale.US);

        if (result == TextToSpeech.LANG_MISSING_DATA
                || result == TextToSpeech.LANG_NOT_SUPPORTED) {
            Toast.makeText(getApplicationContext(), "Language not supported", Toast.LENGTH_SHORT).show();
        } else {
            button.setEnabled(true);
        }

    } else {
        Toast.makeText(getApplicationContext(), "Init failed", Toast.LENGTH_SHORT).show();
    }
}

I also noticed that if you didn't set the language in onInit() there is gonna be a delay!!

And now you can write the method that says the text:

private void speakOut(final String detectedText){
        if(textToSpeech !=null){
            textToSpeech.stop(); //stop and say the new word
            textToSpeech.speak(detectedText ,TextToSpeech.QUEUE_FLUSH, null, null);
        }
    }

Bypassing Google TTS Engine initialization lag in Android

2 Answers2