I am working with CSV files and I have a code that calculates the similarity between the documents. Post 1 provide the code and details of data and output is as follow:
The data.csv looks as:
idx messages
112 I have a car and it is blue
114 I have a bike and it is red
115 I don't have any car
117 I don't have any bike
The output is:
id 112 114 115 117
id
112 100.0 78.0 51.0 50.0
114 78.0 100.0 47.0 54.0
115 51.0 47.0 100.0 83.0
117 50.0 54.0 83.0 100.0
Now I would like to calculate the mean and standard deviation of the lower triangular of the similarity matrix (since both upper and lower are similar) without the identity data (100.0).
I tried to use the panda built-in mean and std as:
df_std = df.std()
df_Mean = df.mean()
But this considers all the data in the output like identity and upper triangular.
I would like to know if there is any way that I can calculate the mean and standard deviation the way that I mentioned.