13

AVSpeechSynthesizer has a fairly simple API, which doesn't have support for saving to an audio file built-in.

I'm wondering if there's a way around this - perhaps recording the output as it's played silently, for playback later? Or something more efficient.

Andrew
  • 7,693
  • 11
  • 43
  • 81

3 Answers3

20

This is finally possible, in iOS 13 AVSpeechSynthesizer now has write(_:toBufferCallback:):

let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?

synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
   guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
      fatalError("unknown buffer type: \(buffer)")
   }
   if pcmBuffer.frameLength == 0 {
     // done
   } else {
     // append buffer to file
     if output == nil { 
       output = AVAudioFile(
         forWriting: URL(fileURLWithPath: "test.caf"), 
         settings: pcmBuffer.format.settings, 
         commonFormat: .pcmFormatInt16, 
         interleaved: false) 
     }
     output?.write(from: pcmBuffer)
   } 
}
Jan Berkel
  • 3,373
  • 1
  • 30
  • 23
  • 8
    When executing under Mac OS 10.15 the callback is never being called. The synthesizer just pronounces the text instead. – vasily Mar 10 '20 at 09:54
  • 2
    I just bumped into that on 10.15 too, any change/workaround @vasily ? oh well back to NSSpeechSynthesizer ‍♂️ – glotcha Apr 27 '20 at 04:46
  • @glotcha No resolution so far – vasily Apr 28 '20 at 05:02
  • This is still a bug in macOS. I've raised a bug report via Feedback Assistant. – Pete Sep 23 '20 at 18:17
  • I've raised this now as a DTS incident with Apple, they're investigating. – Pete Oct 15 '20 at 07:38
  • With Big Sur 11.2 Beta, the callbacks are now being received. However, there is only ever one callback; and the frameLength for that callback is never zero! I've fed this information back to Apple. – Pete Jan 20 '21 at 09:53
  • I've today received the following response from Apple: "On macOS, there is only one callback made with all the audio data. You can know when the speech is finished, because the didFinishSpeaking callback is called when processing is done." – Pete Jan 21 '21 at 21:32
  • With BigSur 11.1.x, the int32 audio data is received, but would appear to be an approximate square wave; you can hear the audio, but is is very heavily distorted. Raised with DTS... – Pete Jan 22 '21 at 13:51
  • any solution yet? I am using Xamarin.IOS and the same happens, the App crashes in "WriteUtterance" and the callback is never executed. 1.5 years have passed... Has at least someone found a workaround? – Jaime Santos Nov 22 '21 at 17:56
  • Can't play the output file use AVAudioPlayer – Purkylin Jan 22 '22 at 00:39
  • For me on macOS 12.2 the callback is still never being called. :/ – Daniel Feb 09 '22 at 19:12
  • 3
    Note that the synthesizer instance should not be created locally in the func. It should either be a member of a class or global, otherwise it is released before having the chance to start synthesizing and the callback is never called. – Vladimir Grigorov Apr 21 '22 at 08:13
4

As of now AVSpeechSynthesizer does not support this . There in no way get the audio file using AVSpeechSynthesizer . I tried this few weeks ago for one of my apps and found out that it is not possible , Also nothing has changed for AVSpeechSynthesizer in iOS 8.

I too thought of recording the sound as it is being played , but there are so many flaws with that approach like user might be using headphones, the system sound might be low or mute , it might catch other external sound, so its not advisable to go with that approach.

Bhumit Mehta
  • 16,278
  • 11
  • 50
  • 64
1

You can use OSX to prepare AIFF files (or, maybe, some OSX-based service) via NSSpeechSynthesizer method startSpeakingString:toURL:

deksden
  • 774
  • 6
  • 13