I have been developing a Python application that takes an audio file and puts it through Whisper API and spits it out in a .docx file. Does anyone know how to add speaker differentiation to it.
Example without speaker differentiation
Hey, have you seen my keys anywhere? I think I saw them on the kitchen counter earlier. Let me check again.
Example with speaker differentiation
Speaker 1: Hey, have you seen my keys anywhere?
Speaker 2: I think I saw them on the kitchen counter earlier. Let me check again.
Any guidance would be great. Thanks!
I have already tried looking this up online though it is not making much sense to me. I haven't been able to find documentation that links closely to using the whisper api