AVSpeechSynthesizer does not speak after using SFSpeechRecognizer

Question

So I built a simple app that does speech recognition using SFSpeechRecognizer and displays the converted speech to text in a UITextView on the screen. Now I'm trying to make the phone speak that displayed text. It doesn't work for some reason. AVSpeechSynthesizer speak function works only before SFSpeechRecognizer was used. For instance, when the app launches, it has some welcome text displayed in the UITextView, if I tap the speak button, the phone will speak out the welcome text. Then if I do record (for speech recognition), the recognized speech will be displayed in the UITextView. Now I want the phone to speak that text, but unfortunately it doesn't.

here is the code

import UIKit
import Speech
import AVFoundation


class ViewController: UIViewController, SFSpeechRecognizerDelegate, AVSpeechSynthesizerDelegate {

    @IBOutlet weak var textView: UITextView!
    @IBOutlet weak var microphoneButton: UIButton!

    private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US"))!

    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    private var recognitionTask: SFSpeechRecognitionTask?
    private let audioEngine = AVAudioEngine()

    override func viewDidLoad() {
        super.viewDidLoad()

        microphoneButton.isEnabled = false

        speechRecognizer.delegate = self

        SFSpeechRecognizer.requestAuthorization { (authStatus) in

            var isButtonEnabled = false

            switch authStatus {
            case .authorized:
                isButtonEnabled = true

            case .denied:
                isButtonEnabled = false
                print("User denied access to speech recognition")

            case .restricted:
                isButtonEnabled = false
                print("Speech recognition restricted on this device")

            case .notDetermined:
                isButtonEnabled = false
                print("Speech recognition not yet authorized")
            }

            OperationQueue.main.addOperation() {
                self.microphoneButton.isEnabled = isButtonEnabled
            }
        }
    }

    @IBAction func speakTapped(_ sender: UIButton) {
        let string = self.textView.text
        let utterance = AVSpeechUtterance(string: string!)
        let synthesizer = AVSpeechSynthesizer()
        synthesizer.delegate = self
        synthesizer.speak(utterance)
    }
    @IBAction func microphoneTapped(_ sender: AnyObject) {
        if audioEngine.isRunning {
            audioEngine.stop()
            recognitionRequest?.endAudio()
            microphoneButton.isEnabled = false
            microphoneButton.setTitle("Start Recording", for: .normal)
        } else {
            startRecording()
            microphoneButton.setTitle("Stop Recording", for: .normal)
        }
    }

    func startRecording() {

        if recognitionTask != nil {  //1
            recognitionTask?.cancel()
            recognitionTask = nil
        }

        let audioSession = AVAudioSession.sharedInstance()  //2
        do {
            try audioSession.setCategory(AVAudioSessionCategoryRecord)
            try audioSession.setMode(AVAudioSessionModeMeasurement)
            try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
        } catch {
            print("audioSession properties weren't set because of an error.")
        }

        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()  //3

        guard let inputNode = audioEngine.inputNode else {
            fatalError("Audio engine has no input node")
        }  //4

        guard let recognitionRequest = recognitionRequest else {
            fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
        } //5

        recognitionRequest.shouldReportPartialResults = true  //6

        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in  //7

            var isFinal = false  //8

            if result != nil {

                self.textView.text = result?.bestTranscription.formattedString  //9
                isFinal = (result?.isFinal)!
            }

            if error != nil || isFinal {  //10
                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)

                self.recognitionRequest = nil
                self.recognitionTask = nil

                self.microphoneButton.isEnabled = true
            }
        })

        let recordingFormat = inputNode.outputFormat(forBus: 0)  //11
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
            self.recognitionRequest?.append(buffer)
        }

        audioEngine.prepare()  //12

        do {
            try audioEngine.start()
        } catch {
            print("audioEngine couldn't start because of an error.")
        }

        textView.text = "Say something, I'm listening!"

    }

    func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool) {
        if available {
            microphoneButton.isEnabled = true
        } else {
            microphoneButton.isEnabled = false
        }
    }
}

@matt I added the code. The original speech to text code was from an appcode tutorial. https://www.appcoda.com/siri-speech-framework/ — Youssef Hammoud, Oct 26 '16 at 19:48
I found [this link](http://avikam.com/software/sfspeechrecognizer-tutorial) very useful. It contains complete source code of speech to text and then text to speech using `AVSpeechSynthesizer ` — Asad Ali, Feb 04 '17 at 14:38

Luca Torella · Answer 1 · 2017-01-03T15:17:40.000

17

You should change this line of the startRecording method from:

try audioSession.setCategory(AVAudioSessionCategoryRecord)

to:

try audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord)

edited Jan 03 '17 at 15:17

answered Nov 07 '16 at 09:09

Luca Torella

7,974
4
38
48

2

This works perfectly. But i noticed that the text-to-speech audio is lower the second time (and consecutive runs). And i don't know why. – Samuel Méndez May 09 '17 at 14:54
I agree with Samuel Méndez.I am facing same issue. – vikas prajapati May 22 '17 at 12:25
@SamuelMéndez Are you using an iPhone 7+ by chance? – Josh Jan 30 '18 at 15:48
@Josh No, it was an iPad 4th gen. – Samuel Méndez Jan 30 '18 at 15:53
2

Is there any solution for low volume audio? – Mrugesh Tank May 01 '18 at 13:28
Did anybody solve the issue with the low volume? I figured out that it switches to the small speaker, as it's the default for playAndRecord, but it only works once on the normal speaker even if I set the options to defaultToSpeaker everytime I start the speech recognizer. – Vlad Rusu Jun 01 '20 at 18:38
1

try audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .allowBluetoothA2DP]) – Digvijaysinh Gida Jun 20 '21 at 18:11
@DigvijaysinhGida, thanks so much for your comment, which prompted me to tinker with other parameters. – Rethunk Feb 03 '23 at 18:47

score 14 · Answer 2 · answered Mar 29 '17 at 16:14

Please use the below code for fixing the issue:

let audioSession = AVAudioSession.sharedInstance()  
            do {

                try audioSession.setCategory(AVAudioSessionCategoryPlayback)
                try audioSession.setMode(AVAudioSessionModeDefault)

            } catch {
                print("audioSession properties weren't set because of an error.")
            }

Here, we have to use  the above code in the following way:

 @IBAction func microphoneTapped(_ sender: AnyObject) {

        if audioEngine.isRunning {
            audioEngine.stop()
            recognitionRequest?.endAudio()
           let audioSession = AVAudioSession.sharedInstance()  
            do {

                try audioSession.setCategory(AVAudioSessionCategoryPlayback)
                try audioSession.setMode(AVAudioSessionModeDefault)

            } catch {
                print("audioSession properties weren't set because of an error.")
            }

            microphoneButton.isEnabled = false
            microphoneButton.setTitle("Start Recording", for: .normal)
        } else {
            startRecording()
            microphoneButton.setTitle("Stop Recording", for: .normal)
        }
    }

Here, After stopping the audioengine we are setting the audioSession Category to AVAudioSessionCategoryPlayback and audioSession Mode to AVAudioSessionModeDefault.Then when you call the next text to speech method ,it will work fine.

This comment helped me to solve my issue and didn't leave me with the audio changing volume. Seems that the important part is in resetting the audioSession and mode once you're finished with recognition. Thanks for sharing this info. — Trevis Thomas, Aug 02 '17 at 23:22
thanks , this saved lot of time, i was searching for the error on web and not noticing it was happening only after i activate recognizer. i though this was error in 11.0.1 ,, but it is not so. — Rishabh Dugar, Oct 02 '17 at 16:47

score 8 · Accepted Answer · answered Oct 26 '16 at 20:24

8

The problem is that when you start speech recognition, you have set your audio session category to Record. You cannot play any audio (including speech synthesis) with an audio session of Record.

answered Oct 26 '16 at 20:24

matt

515,959
87
875
1,141

But if you look at this microphoneTapped function triggered on tapping the mic, if the audio engine is running, it will stop it and end the audio. Am I missing something here? – Youssef Hammoud Oct 26 '16 at 20:27
2

I do not say remove the audio session category part. You need _more_ audio session management, not less. – matt Oct 26 '16 at 21:08
I'm setting session category to record while creating a session. But still not playing audio – Mrugesh Tank May 01 '18 at 13:27

score 3 · Answer 4 · edited Feb 22 '18 at 17:31

3

when using STT, you have to set like this:

AVAudioSession *avAudioSession = [AVAudioSession sharedInstance];

if (avAudioSession) {
    [avAudioSession setCategory:AVAudioSessionCategoryRecord error:nil];
    [avAudioSession setMode:AVAudioSessionModeMeasurement error:nil];
    [avAudioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];
}

When using TTS set AudioSession again, like this:

[regRequest endAudio];

AVAudioSession *avAudioSession = [AVAudioSession sharedInstance];
if (avAudioSession) {
    [avAudioSession setCategory:AVAudioSessionCategoryPlayback error:nil];
    [avAudioSession setMode:AVAudioSessionModeDefault error:nil];
}

Its work perfectly for me. Also the LOW AUDIO problem is solved.

edited Feb 22 '18 at 17:31

klefevre

8,595
7
42
71

answered Jan 08 '18 at 13:18

Yadukrishnan A

55
8

I agree with this. Using `AVAudioSessionModeMeasurement` should be examined, if one experiences very low volume, and/or problems switching between `AVSpeechSynthesizer` and `SFSpeechRecognizer` – coco Feb 03 '18 at 02:57
Yeah, That helps improving apps efficiency. – Yadukrishnan A Jul 12 '18 at 11:03

score 0 · Answer 5 · edited Dec 15 '16 at 10:02

0

try this:

audioSession.setCategory(AVAudioSessionCategoryRecord)

edited Dec 15 '16 at 10:02

FelixSFD

6,052
10
43
117

answered Dec 15 '16 at 09:37

Thành Nguyễn

9
1

Give some explanation – Burhanuddin Rashid Dec 15 '16 at 10:13
1

Why should the OP "try this"? A **good answer** will always have an explanation of what was done and why it was done that way, not only for the OP but for future visitors to SO that may find this question and be reading your answer. – Maximilian Ast Dec 15 '16 at 10:17

AVSpeechSynthesizer does not speak after using SFSpeechRecognizer

5 Answers5