I am trying to classify animal subjects with similar genotypes into 4 classes. The data are labeled and we know the genotype being assigned to each measured subject. I'm able to get 97% test accuracy using Random Forest classifier with no over/under fitting. However, my problem is that the genotypes are not fully distinct in reality and there might be some interrelation/co-variance between them. So, instead of identifying the distinct genotype for new instances, I would like to find the probability of belonging a new instance to any of the four classes (For example, 80% class 1, 10% class 2, 10% class 3)
I have just learned about the Gaussian Mixture Model (GMM) in Scikit-learn. So, my question is: first, if the GMM would be the appropriate method to solve this problem, and second, suggestions for other algorithms that can help.