0

I'm trying to process a folder with audio files through speech to text recognition on MacOS.

If I just process one file, it works, but if I feed multiple files, only one file works and throws an error for rest.

I thought I could use DispatchGroup, but it still feeds everything at once instead of waiting for each item to be completed.

Could someone help me to understand what I'm doing wrong?

let recognizer = SFSpeechRecognizer()
recognizer?.supportsOnDeviceRecognition = true
let group = DispatchGroup()
let fd = FileManager.default
fd.enumerator(at: url, includingPropertiesForKeys: nil)?.forEach({ (e) in
    if let url = e as? URL, url.pathExtension == "wav" || url.pathExtension == "aiff" {
        let request = SFSpeechURLRecognitionRequest(url: url)
        group.enter()
        let task =  recognizer?.recognitionTask(with: request) { (result, error) in
            print("Transcribing \(url.lastPathComponent)")
            guard let result = result else {
                print("\(url.lastPathComponent): No message")
                group.leave()
                return
            }
            while  result.isFinal == false {
                sleep(1)
            }
            print("\(url.lastPathComponent): \(result.bestTranscription.formattedString)")
            group.leave()
        }
        group.wait()
    }
}
group.notify(queue: .main) {
    print("Done")
}

Update: I tried DispatchQueue, but it transcribes only one file and hangs.

let recognizer = SFSpeechRecognizer()
recognizer?.supportsOnDeviceRecognition = true
let fd = FileManager.default
let q = DispatchQueue(label: "serial q")
fd.enumerator(at: url, includingPropertiesForKeys: nil)?.forEach({ (e) in
    if let url = e as? URL, url.pathExtension == "wav" {
        let request = SFSpeechURLRecognitionRequest(url: url)
        q.sync {
            let task =  recognizer?.recognitionTask(with: request) { (result, error) in
                guard let result = result else {
                    print("\(url.lastPathComponent): No message")
                    return
                }
                if result.isFinal {
                    print("\(url.lastPathComponent): \(result.bestTranscription.formattedString)")
                }
            }
        }
    }
})
print("Done")
jl303
  • 1,461
  • 15
  • 27
  • `DispatchGroup` is the wrong approach anyway because the completion handler can be called multiple times and this breaks the well-balanced `enter`/`leave` calls. And `sleep` is horrible. Never wait for something with `sleep`. You need a serial asynchronous `Operation` or an `actor`. – vadian Oct 21 '22 at 18:08
  • Oh, the completion handler for a speech recognizer can be called multiple times? As you say, that won't work then. – Duncan C Oct 21 '22 at 19:00
  • Thanks for the info. I also tried creating DispatchQueue outside the loop and put the task inside q.sync, but it transcribes only one file and hangs. I added my test code at the bottom. Would you mind looking at it and giving some guidance? Thanks! – jl303 Oct 22 '22 at 19:01

2 Answers2

0

If you want your dispatch group to wait for each task to complete before submitting the next, you need to add a `group.wait() inside the loop, after submitting each task.

// Your setup code is unchanged...

fd.enumerator(at: url, includingPropertiesForKeys: nil)?.forEach({ (e) in
    if let url = e as? URL, url.pathExtension == "wav" || url.pathExtension == "aiff" {
        let request = SFSpeechURLRecognitionRequest(url: url)
        group.enter()
        let task =  recognizer?.recognitionTask(with: request) { (result, error) in
            print("Transcribing \(url.lastPathComponent)")
            guard let result = result else {
                print("\(url.lastPathComponent): No message")
                group.leave()
                return
            }
            while  result.isFinal == false {
                sleep(1)
            }
            print("\(url.lastPathComponent): \(result.bestTranscription.formattedString)")
            group.leave()
        }
        group.wait() // <---- Add this
    }

That should do it.

Note that doing it this way will block the main thread. You should really wrap the code that submits jobs and waits for the last one to finish in a call to a background dispatch queue.

Something like this:

DispatchQueue.global().async {
  // Code to loop through and submit tasks, including dispatchGroup logic above.
}
Duncan C
  • 128,072
  • 22
  • 173
  • 272
  • vadian said in a comment that the dispatch handler for a speech recognizer can be called multiple times. If that's true the above approach won't work. It has merit for other cases where you want to sequence async background tasks though. – Duncan C Oct 21 '22 at 19:01
0

This is a async/await solution with a Continuation. It runs sequentially.

let recognizer = SFSpeechRecognizer()
recognizer?.supportsOnDeviceRecognition = true

let fd = FileManager.default
let enumerator = fd.enumerator(at: url, includingPropertiesForKeys: nil, options: .skipsHiddenFiles)!
Task {
    for case let fileURL as URL in enumerator where ["wav", "aiff"].contains(fileURL.pathExtension) {
        do {
            try await recognizeText(at: fileURL)
        } catch {
            print(error)
        }
    }
}


func recognizeText(at url: URL) async throws {
    return try await withCheckedThrowingContinuation { (continuation : CheckedContinuation<Void, Error>) in
        let request = SFSpeechURLRecognitionRequest(url: url)
        let task =  recognizer?.recognitionTask(with: request) { (result, error) in
            print("Transcribing \(url.lastPathComponent)")
            if let error = error {
                continuation.resume(throwing: error)
                print("\(url.lastPathComponent): No message")
            } else {
                print("\(url.lastPathComponent): \(result!.bestTranscription.formattedString)")
                if result!.isFinal {
                    continuation.resume(returning: ())
                }
            }
        }
    }
}
vadian
  • 274,689
  • 30
  • 353
  • 361
  • That worked! It processes one file at a time. However, if a file is long, it stops transcribing and moves onto next file for some reason. – jl303 Oct 24 '22 at 05:06