I want to perform mean-shift clustering in R and found out that there are at least two packages that have this functionality: MeanShift
and meanShiftR
. As showed here the latter is much faster and as I tried out the first one and it took a long time to perform a clustering, I'm keen on choosing meanShiftR
. However meanShiftR::meanShift
function has rather uncommon way of bandwidth specification, see part of documentation:
queryData A matrix or vector of points to be classified by the mean shift algorithm. Values must be finite and non-missing.
bandwidth A vector of length equal to the number of columns in the queryData matrix, or length one when queryData is a vector. This value will be used in the kernel density estimate for steepest ascent classification. The default is one for each dimension.
I'm not an expert in mean-shift clustering, but the only banwidth specifications I have found in the literature is that bandwidth is scalar or positive definite, symmetric matrix, not a vector. So is this the technical trick to represent the bandwidth and the value of bandwidth have to be the same for each dimension? Or maybe it can vary?
The other issue is that even setting the same value of bandwidth in meanShiftR package as in MeanShift::msClustering, but just replicated to match the number of columns, I've obtained totally different results, in particular much larger number of cluster. Also, the modes were rather very similar and not representative of the dataset. That made me wonder if this package works correct. Have someone even used meanShiftR
? If so, maybe you could present any example as the documentation is not clear enough for me?