I know we can measure the "sameness" in signal using cross-corellation, but how do we calculate the percentage of "sameness" in text?
for example we have: 1. "The Legend of Awesome Dog" 2. "Dog Awesome The legend of" which is like 100% same but shuffled.
but when paired with : 3. "Dog awesome number 9" which only got 40% sameness with sentence 1 or 2.