I have 5000 data points for each of my 17 features in a numpy array resulting in a 5000 x 17 array. I am trying to find the outliers for each feature using Gaussian mixture and I am rather confused on the following: 1)how many components should I use for my GaussiasnMixture? 2) Should I fit the GaussianMixture directly on the array of 5000 x 17 or to each feature column seperately resulting in 17 GaussianMixture models?
clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
clf.fit(full_feature_array)
or
clf = mixture.GaussianMixture(n_components=17, covariance_type='full')
clf.fit(full_feature_array)
or
for feature in range(0, full_feature_matrix):
clf[feature] = mixture.GaussianMixture(n_components=1, covariance_type='full')
clf.fit(full_feature_array[:,feature)