I need to calculate how much of a block of text (A
) is in another block of text (B
). Simple algorithms like soundex aren't providing great results for me as text B
has additional text within it that isn't/shouldn't be in text A
, which throws my figures off. I need to ensure a certain percentage of A
is within B
, and ignore the additions to B
.
My first thought for a simple algorithm that might work well in my case would be to split A
into sentences, note the total number of sentences, then search B
for an instance of each sentence to provide a percentage. While this should work it feels quite hacky, and I'm sure someone more intelligent than I has devised an algorithm to provide a better calculation on a similar principle.