I have some 2D data (GPS data) with clusters (stop locations) that I know resemble Gaussians with a characteristic standard deviation (proportional to the inherent noise of GPS samples). The figure below visualizes a sample that I expect has two such clusters. The image is 25 meters wide and 13 meters tall.
The sklearn
module has a function sklearn.mixture.GaussianMixture
which allows you to fit a mixture of Gaussians to data. The function has a parameter, covariance_type
, that enables you to assume different things about the shape of the Gaussians. You can, for example, assume them to be uniform using the 'tied'
argument.
However, it does not appear directly possible to assume the covariance matrices to remain constant. From the sklearn
source code it seems trivial to make a modification that enables this but it feels a bit excessive to make a pull request with an update that allows this (also I don't want to accidentally add bugs in sklearn
). Is there a better way to fit a mixture to data where the covariance matrix of each Gaussian is fixed?
I want to assume that the SD should remain constant at around 3 meters for each component, since that is roughly the noise level of my GPS samples.