SFSpeechRecognizer from Local Video

Question

I'm trying to implement speech transcription (voice to text) from a video. My approach is breaking this down into 3 steps:

Convert video to audio file (m4a/mp3)
Pass audio to SFSpeechRecognizer request with audio file url
Prase results

My issue is that I haven't found a way to convert the source video file (let's say .mov) into an audio only file. The AVAsset itself of the video, doesn't have any audio tracks, but still has audio when playing the file (so it does exist).

I imagine if I can solve step 1, then 2 + 3 are trivial, so my question is - what is the best way to convert a video file into an audio only file, which I can then use for transcription.

score 1 · Answer 1 · answered May 11 '22 at 20:01

1

You can use FFmpegKit library to extract an audio part of the video.

The library example: https://github.com/tanersener/ffmpeg-kit/tree/main/apple#3-using

The ffmpeg command example to extract audio: https://stackoverflow.com/a/27413824/5707560

answered May 11 '22 at 20:01

Yehor Smoliakov

326
3
13

SFSpeechRecognizer from Local Video

1 Answers1