I have an iOS app developed in XCode/objective C. It uses the iOS Speech API to process continuous speech recognition. It is working, but I want to turn the mic icon red when speech starts, I also want to detect when speech ends.
I implement the interface SFSpeechRecognitionTaskDelegate which gives the callback onDetectedSpeechStart and speechRecognitionTask:didHypothesizeTranscription: but these do not occur until the end of the first word is processed, not at the very start of the speech.
I would like to detect the very start of the speech (or any noise). I think it should be possible from the installTapOnBus: from the AVAudioPCMBuffer but am not sure how to detect if this is silence versus noise that could be speech.
Also the speech API does not give an event when the person stops talking, i.e. silence detection, it just records until it times out. I have a hack for detecting silence by checking the time between the last event fired, not sure if their is a better way to do this.
Code is here,
NSError * outError;
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
[audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
if (speechRequest == nil) {
NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
return;
}
audioEngine = [[AVAudioEngine alloc] init];
AVAudioInputNode* inputNode = [audioEngine inputNode];
speechRequest.shouldReportPartialResults = true;
// iOS speech does not detect end of speech, so must track silence.
lastSpeechDetected = -1;
speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];
[inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
long millis = [[NSDate date] timeIntervalSince1970] * 1000;
if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
lastSpeechDetected = -1;
[speechTask finish];
return;
}
[speechRequest appendAudioPCMBuffer: buffer];
}];
[audioEngine prepare];
[audioEngine startAndReturnError: &outError];