Grid search in clustering

Question

I am using grid search having silhouette score , but on some algorithms(DBSCAN) it return cluster 1 as it has the highest score. For example I was performing image clustering with default sklearn DBSCAN function it resulted silhoutte score -0.03 and 30+ well defined clusters but when I perform gridsearch it resulted higher silhouette score around 0.123 but only 1 cluster. How can I best hypertune my clustering algorithms using grid search.

Update: I am sharing the snippet of the code , I take the reference from Scikit Learn GridSearchCV without cross validation (unsupervised learning)

This is the score function:

def cv_silhouette_scorer(estimator, X):
    estimator.fit(X)
    try:
        cluster_labels = estimator.labels_
    except Exception as e:
      #  print(e,estimator)
        cluster_labels=estimator.predict(X)
    num_labels = len(set(cluster_labels))
    num_samples = len(X.index)
    if num_labels == 1 or num_labels == num_samples:
        return -1
    else:
        return metrics.silhouette_score(X, cluster_labels)

This is the gridSearch function

def runGridSearch(estimator,params_dict,train_data):
    
    cv = [(slice(None), slice(None))]
    gs = GridSearchCV(estimator=estimator, param_grid=params_dict, scoring=cv_silhouette_scorer, cv=cv, n_jobs=-1)
    gs.fit(train_data)
    try:
        predicted_labels= gs.best_estimator_.labels_
    except:
        predicted_labels=gs.predict(train_data)
    
    
    return predicted_labels

It might help : https://stackoverflow.com/questions/25633383/how-can-gridsearchcv-be-used-for-clustering-meanshift-or-dbscan — s3nh, Jun 28 '20 at 09:01
@hamedbaziyad I have updated the code and my problem , please have a look — Himani Negi, Jun 28 '20 at 09:34
@s3nh yes I had a look at that answer, but they have considered the number of clusters and not any evaluation criteria. I am not sure whether this is a right approach. — Himani Negi, Jun 28 '20 at 09:39

Grid search in clustering

0 Answers0