I have a dataframe consisting of individual tweets (id, text, author_id, nn_list) where nn_list is a list of other tweet indices which were previously identified as potential nearest neighbours. Now I have to calculate the cosine similarity of the index and every single entry of this list by looking at the index in the tfidf matrix to compare the vectors but with my current approach this is kind of slow. The current code looks something like this:
for index, row in data_df.iterrows():
for candidate in row["nn_list"]:
candidate_cos = float("%.2f" % pairwise_distances(tfidf_matrix[candidate], tfidf_matrix[index], metric='cosine'))
if candidate_cos < nn_distance:
current_nn_candidate = candidate
nn_distance = candidate_cos
Is there a significantly faster way to calculate this?