I'm trying to make a Speaker recognition (not speech but speaker) system using Python. I've extracted mfcc
features of both train audio file and test audio file and have made a gmm
model for each. I'm not sure how to compare the models to compute a score of similarity based on which I can program the system to validate the test audio. I'm struggling for 4 days to get this done. Would be glad if someone can help.
Asked
Active
Viewed 1,250 times
0

Ubdus Samad
- 1,218
- 1
- 15
- 27

Tilak Sharma
- 1
- 1
- 3
-
1Please provide more details and show your efforts (If any). – Ubdus Samad Apr 22 '18 at 08:01
-
im taking 3 audio files to model a training set(.gmm) and then taking one more audio clip(test clip) to compare it with the training model to compute the similarity – Tilak Sharma Apr 22 '18 at 08:06
-
Possible duplicate of [Python Speaker Recognition](https://stackoverflow.com/questions/7309219/python-speaker-recognition) – Nikolay Shmyrev Apr 22 '18 at 15:06
1 Answers
-1
From what I can understand from the question, you are describing an aspect of the cocktail party problem I have found a whitepaper with a solution to your problem using a modified iterative Wiener filter and a multi-layer perceptron neural network that can separate speakers into separate channels.
Intrestingly the cocktail party problem can be solved in one line in ocatve: [W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
you can read more about it on this stackoverflow post

James Burgess
- 487
- 1
- 4
- 12