I have a data frame where the rows represent objects and columns are object features.
I am trying to compute the cosine similarity of the objects, when I run the code it seems to work just fine, however when I sort the distances, the closets objects all have a distance of 0, which would only be possible if their vectors were the same, which is not the case.
I, looked into the data output and it seems that any number that has a precision beyond E-16 just goes to 0 (its shows as 0 both in the terminal print out and also in the csv file output)
The columns are float64 format.
How can I show greater precision?
For reference here is the code I am running:
import pandas as pd
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform
dfe = pd.read_csv('file.csv')
dfe = dfe.set_index('object')
dfe = dfe.fillna(dfe.mean())
pairwise = pd.DataFrame(squareform(pdist(dfe, metric='cosine')),columns = dfe.index,index = dfe.index)
long_form = pairwise.unstack()
long_form.index.rename(['object_1', 'object_2'], inplace=True)
long_form = long_form.to_frame('distance').reset_index()