1

I am using grid search having silhouette score , but on some algorithms(DBSCAN) it return cluster 1 as it has the highest score. For example I was performing image clustering with default sklearn DBSCAN function it resulted silhoutte score -0.03 and 30+ well defined clusters but when I perform gridsearch it resulted higher silhouette score around 0.123 but only 1 cluster. How can I best hypertune my clustering algorithms using grid search.

Update: I am sharing the snippet of the code , I take the reference from Scikit Learn GridSearchCV without cross validation (unsupervised learning)

This is the score function:

def cv_silhouette_scorer(estimator, X):
    estimator.fit(X)
    try:
        cluster_labels = estimator.labels_
    except Exception as e:
      #  print(e,estimator)
        cluster_labels=estimator.predict(X)
    num_labels = len(set(cluster_labels))
    num_samples = len(X.index)
    if num_labels == 1 or num_labels == num_samples:
        return -1
    else:
        return metrics.silhouette_score(X, cluster_labels)
 

This is the gridSearch function

def runGridSearch(estimator,params_dict,train_data):
    
    cv = [(slice(None), slice(None))]
    gs = GridSearchCV(estimator=estimator, param_grid=params_dict, scoring=cv_silhouette_scorer, cv=cv, n_jobs=-1)
    gs.fit(train_data)
    try:
        predicted_labels= gs.best_estimator_.labels_
    except:
        predicted_labels=gs.predict(train_data)
    
    
    return predicted_labels 

0 Answers0