I'm trying to cluster time series. The intra-cluster elements have same shapes but different scales. Therefore, I would like to use a correlation measure as metric for clustering. I'm trying correlation or pearson coefficient distance (any suggestion or alternative is welcome). However, the following code returns error when I run Z = linkage(dist) because there are some NaN values in dist. There are not NaN values in time_series, this is confirmed by
np.any(isnan(time_series))
which returns False
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import dendrogram, linkage
dist = pdist(time_series, metric='correlation')
Z = linkage(dist)
fig = plt.figure()
dn = dendrogram(Z)
plt.show()
As alternative, I will use pearson distance
from scipy.stats import pearsonr
def pearson_distance(a,b):
return 1 - pearsonr(a,b)[0]
dist = pdist(time_series, pearson_distance)`
but this generates some runtime warnings and takes a lot of time.