2

Whenever switching the AVSpeechSynthesizer voice to a (different) German voice the app waits/hangs for a few seconds (depending on device) before speech output starts.

Looking into the console output I see that the German language rules data is five to nine times large than e.g. English or Italian:

10:55:16.137820+0200    ... #MobileAsset listing ...'[Available: true, Language: de-DE]'
...
10:55:07.636488+0200    ... Loading on disk rule data: 4392529
10:55:10.818661+0200    ... processing rules: 459, NS: 28669
10:55:10.840711+0200    ... Creating playback session rate: 22050, channels 1

showing the loading on disk rules took about 3.2 secs in this German voice case.

If I look into loading of an English or Italian voice those load times are much shorter:

10:55:16.137820+0200    ... #MobileAsset listing ...'[Available: true, Language: en-US]'
...
10:55:16.148741+0200    ... Loading on disk rule data: 862210
10:55:16.407063+0200    ... processing rules: 12192, NS: 1611
10:55:16.428606+0200    ... Creating playback session rate: 22050, channels 1
10:55:16.137820+0200    ... #MobileAsset listing ...'[Available: true, Language: it-IT]'
...
10:54:50.816431+0200    ... Loading on disk rule data: 536565
10:54:51.493129+0200    ... processing rules: 2567, NS: 4149
10:54:51.514703+0200    ... Creating playback session rate: 22050, channels 1

showing load times of 0.25 and 0.7 secs, only!

Interestingly, if I do a small test app which has exactly the same setup and usage of AVSpeechSynthesizer as in my main app, I can NOT reproduce those lengthy load times, respectively lags/waits until speech output starts following a voice change to a (different) German voice.

This is my code for calling AVSpeechSynthesizer.speak(_):

    func speak(_ textToSpeak: String) {

        appState.isPrePause = true
                
        // Detect language of incoming text to speak.

        var lang = ""
        if let dominantLanguage = NLLanguageRecognizer.dominantLanguage(for: textToSpeak) {
            lang = dominantLanguage.rawValue
        } else {
            lang = "en"
        }
        
        // Select a voice based on the detected lanuage.

        let voice = AVSpeechSynthesisVoice(language: lang)
        if voice == nil {
            print("WARNING: no voice for the current language \(lang). Falling back to default voice.")
        }
        
        let utterance = AVSpeechUtterance(string: textToSpeak)

        utterance.voice = voice
        utterance.preUtteranceDelay = appState.preUtteranceDelay
        utterance.postUtteranceDelay = appState.postUtteranceDelay

        avSpeechSynth.speak(utterance)
    }

The described lag is happening between calling avSpeechSynth.speak(utterance) and the Synthesizer Delegate callback "didStart".

Has anybody experienced something like this? Any suggestions on where to dig further?

UPDATE 04-2023:

To my surprise - on the iOS 16.4 Simulator with Xcode 14.3 - the change of German voices from one utterance to the next worked as fast as it should be, no more 3+ secs delay! Cool, has Apple finally solved the bug?! I tried on devices. But no: on device with iOS 16.4.1 installed, the same issue as always, the delays between utterances with different German voices are back.

Reinstalled the app on the device, re-downloaded the German voices used. But no luck.

This is the console output of the app running on device. The delay happens after the first "[AXTTSCommon] Invalid rule:" appears in the console.

Speech Synthesizer - Current utterance voice: Optional("Viktor (Enhanced)") | language: Optional("de-DE")
2023-04-09 13:04:06.172618+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:   <-- DELAY HAPPENS AFTER THIS LINE
2023-04-09 13:04:10.052499+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:
2023-04-09 13:04:10.053138+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:
2023-04-09 13:04:10.055567+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:
2023-04-09 13:04:10.113567+0200 SpeechApp[914:31164] [audio] 

--- SpeechSynthesizer Delegate - did START speaking utterance.

The console output of the Simulator shows only one line with "[AXTTSCommon] Invalid rule:" and moves over it quickly, without any delay:

Speech Synthesizer - Current utterance voice: Optional("Viktor (Enhanced)") | language: Optional("de-DE")
2023-04-09 13:01:59.764986+0200 SpeechApp[7145:111421] [AXTTSCommon] Invalid rule:
2023-04-09 13:01:59.778640+0200 SpeechApp[7145:108690] [audio] 

--- SpeechSynthesizer Delegate - did START speaking utterance.

Can anyone confirm that switching German voices between utterances works correctly on Simulator, while still showing unacceptable delays between utterances on device?

Any idea what could be the different between Simulator and Device? This could hint at the core of the issue.

KlausM
  • 193
  • 12
  • I experimented the same problem with a Flutter Plugin. The iOS Part of the Plugin is written in swift and it has a big delay with german voices: https://github.com/dlutton/flutter_tts/issues/323. I also made a small swift app which exactly the same code as within the plugin. There it is fast. No idea what to do... Did you find a solution? – holger.meyer Feb 05 '23 at 18:46
  • @holger.meyer No, did not find any solution, yet. Very annoying as switching languages on the fly between paragraphs is a main feature of my app (currently in beta). A small test app I did does not have the issue, but I have no idea why the test app behaves differently. Apple asked for a sysdiagnose in September last year, but I have not gotten any further feedback from Apple. May I ask you to file a feedback to Apple, too. My feedback number is FB11380447 (https://feedbackassistant.apple.com/feedback/11380447). – KlausM Feb 06 '23 at 04:51
  • Seems that iOS 17beta fixed the above issue. Although German language voice switching is still about 0.2 to 0.5 secs slower than switching between English language voices. – KlausM Jun 23 '23 at 20:22

0 Answers0