How to count/measure correlation of text in excel?

Question

I know we can measure the "sameness" in signal using cross-corellation, but how do we calculate the percentage of "sameness" in text?

for example we have: 1. "The Legend of Awesome Dog" 2. "Dog Awesome The legend of" which is like 100% same but shuffled.

but when paired with : 3. "Dog awesome number 9" which only got 40% sameness with sentence 1 or 2.

See [this answer](https://stackoverflow.com/a/15303672/8112776). — ashleedawg, Jan 08 '18 at 08:21

score 0 · Accepted Answer · answered Jan 21 '18 at 22:12

You are looking for aproximate string matching. There is a free add-on for Excel, developed by Microsoft to create a so called Fuzzy match. It uses the Jaccard index algorithm to determine the similarity of two given values.

Make sure that both columns are a table (Ctrl+L);
Link the columns in the 'Left Columns' and the 'Right Columns' section and press the connect button in the middle;
Select which columns you want as output (hold Ctrl if you want to select multiple columns on either the left or the right side);
Make sure the FuzzyLookup.Similarity is checked;
Determine the maximum number of matches shown per comparable string;
Determine your Threshold. The number represents the minimum percentage of similarity between two strings before it marks it as a match;
Go to a new sheet to cell A1;
Hit the 'Go'button!
Select all the similarity scores and give them more decimals for a proper result.

See example.

How to count/measure correlation of text in excel?

1 Answers1