0

In an online meeting such as Google Meet/ Zoom, I want to detect change of speaker and then transcribe the audio for different speakers.

I am using Deepspeech model for speech to text. I have fine-tuned the model for Indian accent english but I want to add speech diarization feature in this. Is there a way to do the same? I don't want to identify the user by name, just want to find part of audios spoken by different speakers.

1 Answers1

1

DeepSpeech does not include any functionality for speaker recognition, and you would have to change the model architecture significantly and re-train a model for speaker recognition capabilities.

You may wish to look at Whisper from OpenAI - which is an end to end model train for several tasks at once, including speaker recognition.

https://openai.com/blog/whisper/

Kathy Reid
  • 575
  • 4
  • 6