Whenever switching the AVSpeechSynthesizer voice to a (different) German voice the app waits/hangs for a few seconds (depending on device) before speech output starts.
Looking into the console output I see that the German language rules data is five to nine times large than e.g. English or Italian:
10:55:16.137820+0200 ... #MobileAsset listing ...'[Available: true, Language: de-DE]'
...
10:55:07.636488+0200 ... Loading on disk rule data: 4392529
10:55:10.818661+0200 ... processing rules: 459, NS: 28669
10:55:10.840711+0200 ... Creating playback session rate: 22050, channels 1
showing the loading on disk rules took about 3.2 secs in this German voice case.
If I look into loading of an English or Italian voice those load times are much shorter:
10:55:16.137820+0200 ... #MobileAsset listing ...'[Available: true, Language: en-US]'
...
10:55:16.148741+0200 ... Loading on disk rule data: 862210
10:55:16.407063+0200 ... processing rules: 12192, NS: 1611
10:55:16.428606+0200 ... Creating playback session rate: 22050, channels 1
10:55:16.137820+0200 ... #MobileAsset listing ...'[Available: true, Language: it-IT]'
...
10:54:50.816431+0200 ... Loading on disk rule data: 536565
10:54:51.493129+0200 ... processing rules: 2567, NS: 4149
10:54:51.514703+0200 ... Creating playback session rate: 22050, channels 1
showing load times of 0.25 and 0.7 secs, only!
Interestingly, if I do a small test app which has exactly the same setup and usage of AVSpeechSynthesizer as in my main app, I can NOT reproduce those lengthy load times, respectively lags/waits until speech output starts following a voice change to a (different) German voice.
This is my code for calling AVSpeechSynthesizer.speak(_):
func speak(_ textToSpeak: String) {
appState.isPrePause = true
// Detect language of incoming text to speak.
var lang = ""
if let dominantLanguage = NLLanguageRecognizer.dominantLanguage(for: textToSpeak) {
lang = dominantLanguage.rawValue
} else {
lang = "en"
}
// Select a voice based on the detected lanuage.
let voice = AVSpeechSynthesisVoice(language: lang)
if voice == nil {
print("WARNING: no voice for the current language \(lang). Falling back to default voice.")
}
let utterance = AVSpeechUtterance(string: textToSpeak)
utterance.voice = voice
utterance.preUtteranceDelay = appState.preUtteranceDelay
utterance.postUtteranceDelay = appState.postUtteranceDelay
avSpeechSynth.speak(utterance)
}
The described lag is happening between calling avSpeechSynth.speak(utterance) and the Synthesizer Delegate callback "didStart".
Has anybody experienced something like this? Any suggestions on where to dig further?
UPDATE 04-2023:
To my surprise - on the iOS 16.4 Simulator with Xcode 14.3 - the change of German voices from one utterance to the next worked as fast as it should be, no more 3+ secs delay! Cool, has Apple finally solved the bug?! I tried on devices. But no: on device with iOS 16.4.1 installed, the same issue as always, the delays between utterances with different German voices are back.
Reinstalled the app on the device, re-downloaded the German voices used. But no luck.
This is the console output of the app running on device. The delay happens after the first "[AXTTSCommon] Invalid rule:" appears in the console.
Speech Synthesizer - Current utterance voice: Optional("Viktor (Enhanced)") | language: Optional("de-DE")
2023-04-09 13:04:06.172618+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule: <-- DELAY HAPPENS AFTER THIS LINE
2023-04-09 13:04:10.052499+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:
2023-04-09 13:04:10.053138+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:
2023-04-09 13:04:10.055567+0200 SpeechApp[914:31631] [AXTTSCommon] Invalid rule:
2023-04-09 13:04:10.113567+0200 SpeechApp[914:31164] [audio]
--- SpeechSynthesizer Delegate - did START speaking utterance.
The console output of the Simulator shows only one line with "[AXTTSCommon] Invalid rule:" and moves over it quickly, without any delay:
Speech Synthesizer - Current utterance voice: Optional("Viktor (Enhanced)") | language: Optional("de-DE")
2023-04-09 13:01:59.764986+0200 SpeechApp[7145:111421] [AXTTSCommon] Invalid rule:
2023-04-09 13:01:59.778640+0200 SpeechApp[7145:108690] [audio]
--- SpeechSynthesizer Delegate - did START speaking utterance.
Can anyone confirm that switching German voices between utterances works correctly on Simulator, while still showing unacceptable delays between utterances on device?
Any idea what could be the different between Simulator and Device? This could hint at the core of the issue.