I need to calculate cosine similarity on a huge files that include rows of numbers, for example:
6 3 574
11 1 6 575 576 321
4 577 6 64
69 11 6 55
11 218 6 578 579 580 581 229 582 583 155 100 584 148 446 585
I already store it on a matrix of string, that make the split and each number is different cell.
string[] lines = FileBuff.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
FileMatrix = new string[lines.Length][];
for (int i = 0; i < lines.Length; i++)
{
FileMatrix[i] = lines[i].Split(new string[] { "\t", " " }, StringSplitOptions.None);
}
My question is how to calculate cosine similarity of rows that is in
different sizes?
for calc the numerator its must to be in the same size (A[i]*B[i]+A[i+1]*B[i+1]+.....)
i found this example, its the same problem like mine just with letters:
Document 1: The quick brown fox jumped over the lazy dog.
Global order: The quick brown fox jumped over the lazy dog
Vector for Doc 1: 1 1 1 1 1 1 1 1 1
Document 2: The runner was quick.
Global order: The quick brown fox jumped over the lazy dog runner was
Vector for Doc 1: 1 1 1 1 1 1 1 1 1
Vector for Doc 2: 1 1 0 0 0 0 0 0 0 1 1
In this case, in theory I need to pad the Document 1 vector with zeroes on the end. i need help for some code that makes it