I have a 2D array with vectorised rows with each row representing a document in the corpus:
array[[ 0.0 0.0 0.4583 0.6584 0.0]
...
[0.4390 0.0 0.0 0.5749 0.0]]
I have calculated cosine similarity for each row/vector in the 2D array with every other vector like so:
#calculate semantic similarity for all permutations all in one go
for i in range(Vectors.shape[0]): #for each vector/row in 2D array
for j in range(i + 1, Vectors.shape[0]): #for each row + 1 in the 2D array
cosine_similarities = linear_kernel(Vectors[i], Vectors[j]).flatten()
#np.savetxt("foo.csv", cosine_similarities, delimiter=",")
pd.DataFrame(cosine_similarities).to_csv("test_matrix.csv", mode = 'a') #save into csv as a matirix
The output prior to saving into a csv looks like:
[0.5748389]
[0.5847379]
...
[0.3257490]
How am I able to transform the output into a matrix and save that into a csv?
The output I'm looking for is:
0 1 ... 76
0 0.5748389 0.5847379 0.3257490
1 ... ... ... ...
...
76
UPDATE: I followed this and it worked out! Using cosine similarity function directly on a sparse matrix worked, and then converted it to a list and then dataframe. See: What's the fastest way in Python to calculate cosine similarity given sparse matrix data? for more info!