In an online meeting such as Google Meet/ Zoom, I want to detect change of speaker and then transcribe the audio for different speakers.
I am using Deepspeech model for speech to text. I have fine-tuned the model for Indian accent english but I want to add speech diarization feature in this. Is there a way to do the same? I don't want to identify the user by name, just want to find part of audios spoken by different speakers.