Here my settings of Google Speech to Text AI
Here is the output file of Speech to Text AI : https://justpaste.it/speechtotext2
Here is the output file of YouTube's auto caption: https://justpaste.it/ytautotranslate
This is the video link : https://www.youtube.com/watch?v=IOMO-kcqxJ8&ab_channel=SoftwareEngineeringCourses-SECourses
This is the audio file of the video provided to Google Speech AI : https://storage.googleapis.com/text_speech_furkan/machine_learning_lecture_1.flac
Here I am providing time assigned SRT files
YouTube's SRT : https://drive.google.com/file/d/1yPA1m0hPr9VF7oD7jv5KF7n1QnV3Z82d/view?usp=sharing
Google Speech to Text API's SRT (timing assigned by YouTube) : https://drive.google.com/file/d/1AGzkrxMEQJspYenCbohUM4iuXN7H89wH/view?usp=sharing
I made comparison for some sentences and definitely YouTube's auto translation is better
For example
Google Speech to Text : Represent the **doctor** representation is one of the hardest part of computer AI you will learn about more about that in the future lessons.
What does this mean? Do you think this means that we are not just focused on behavior and **into doubt**. It is more about the reasoning when a human takes an action. There is a reasoning behind it.
YouTube's auto captioning : represent the **data** representation is one of the hardest part of computer ai you will we will learn more about that in the future lessons
what does this mean do you think this means that we are not just focused on behavior and **input** it is more about the reasoning when a human takes an action there is a reasoning behind it
I checked many cases and YouTube's guessing correct words is much better. How is this even possible?
This is the command I used to extract audio of the video : ffmpeg -i "input.mkv" -af aformat=s16:48000:output.flac