0

I am new to python. I have this csv data with me

https://github.com/anoopkunchukuttan/indic_nlp_resources/blob/master/script/english_script_phonetic_data.csv

Now what I want to do that is I want to find a cost matrix which will be the cosine distance of the vectors and each vector will be corresponding to the ITRANS value. For example

AO = [1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0]

AA = [1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1]

and so on.

And what I am trying to do is that find a cosine similarity matrix between all the vectors

    A0   AA ....
A0 [ X   Y
AA [
   [

Where X is the cosine distance between A0*A0 and Y is the cosine distance between A0 and AA. Can anyone guide how to write a python script for the same.

Turing101
  • 347
  • 3
  • 15
  • 1
    read the csv using pandas, and calculate cosine similarity using sklearn, there is an example here https://stackoverflow.com/questions/45387476/cosine-similarity-between-each-row-in-a-dataframe-in-python – trigonom Feb 03 '23 at 08:49

0 Answers0