I have list of arrays and I want to calculate the cosine similarity for each combination of arrays in my list of arrays.
My full list comprises 20 arrays with 3 x 25000. A small selection below
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity,cosine_distances
C = np.array([[-127, -108, -290],
[-123, -83, -333],
[-126, -69, -354],
[-146, -211, -241],
[-151, -209, -253],
[-157, -200, -254]])
D = np.array([[-129, -146, -231],
[-127, -148, -238],
[-132, -157, -231],
[ -93, -355, -112],
[ -95, -325, -137],
[ -99, -282, -163]])
E = np.array(([[-141, -133, -200],
[-132, -123, -202],
[-119, -117, -204],
[-107, -210, -228],
[-101, -194, -243],
[-105, -175, -244]]))
ArrayList = (C,D,E)
My first problem is I am getting a pairwise result for each element of each array, however, what I am trying to achieve is the result looking at the arrays as a whole.
For example I try
scores = cosine_similarity(C,D)
scores
array([[0.98078461, 0.98258287, 0.97458466, 0.643815 , 0.71118811,
0.7929595 ],
[0.95226207, 0.95528395, 0.9428837 , 0.55905221, 0.63291722,
0.7240552 ],
[0.9363733 , 0.93972303, 0.9255921 , 0.51752531, 0.59402196,
0.68918496],
[0.98998438, 0.98903931, 0.99377116, 0.85494921, 0.8979725 ,
0.9449272 ],
[0.99335622, 0.99255262, 0.99635952, 0.84106771, 0.88619755,
0.93616556],
[0.9955969 , 0.99463213, 0.99794805, 0.82706302, 0.8738389 ,
0.92640196]])
What I am expecting is a singular value 0.989... (this is a made up number) The next challenge is how to iterate over each array in my list of arrays to get a pairwise result of the array something like this
C D E
C 1.0 0.97 0.95
D 0.97 1.0 0.96
E 0.95 0.95 1.0
As a beginner to python I am not sure how to proceed. Any help appreciated.