0

I'm trying to make a Speaker recognition (not speech but speaker) system using Python. I've extracted mfcc features of both train audio file and test audio file and have made a gmm model for each. I'm not sure how to compare the models to compute a score of similarity based on which I can program the system to validate the test audio. I'm struggling for 4 days to get this done. Would be glad if someone can help.

Ubdus Samad
  • 1,218
  • 1
  • 15
  • 27
Tilak Sharma
  • 1
  • 1
  • 3

1 Answers1

-1

From what I can understand from the question, you are describing an aspect of the cocktail party problem I have found a whitepaper with a solution to your problem using a modified iterative Wiener filter and a multi-layer perceptron neural network that can separate speakers into separate channels.

Intrestingly the cocktail party problem can be solved in one line in ocatve: [W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
you can read more about it on this stackoverflow post

James Burgess
  • 487
  • 1
  • 4
  • 12