I am working with the affinity propagation algorithm and I want to write it from scratch without using scikit-learn. I have written the responsibility and availability matrices with nested for loops or list comprehension but the execution time of each one is more than 30 minutes with data of more than 2000 individuals.
from scipy.spatial.distance import euclidean, pdist, squareform
import numpy as np
import pandas as pd
def similarity_func(u, v):
return -euclidean(u,v)
csv_data = pd.read_csv("DadosC.csv", delimiter=",", encoding="utf8", engine="python")
X = csv_data[{"InicialL", "FinalL"}].to_numpy().copy()
dists = pdist(X, similarity_func)
distM = np.array(squareform(dists))
np.fill_diagonal(distM, np.median(distM))
distM
A = np.zeros((X.shape[0],X.shape[0]))
def Respo(A,S,i,j):
a_0 = np.delete(A[i],j)
s_0 = np.delete(S[i],j)
return S[i][j]-max(a_0+s_0)
Lis = [[Respo(A,distM,i,j) for i in range(X.shape[0])] for j in range(X.shape[0])]
Res = np.reshape(Lis,(X.shape[0],X.shape[0])).T
This is what I have, A and S are 2000x2000 array, A is initialized as null but is then updated with a similar function. When X is a 2000x2 array it takes too long to calculate. What alternative can you think of?