I have n list of large texts, I have to identify which are the texts that exist in all or the list numbers where it exists and with match accuracy(match percentage). I need an algorithm to implement the same. Preferably in .NET, however, open for options in any technology.
Let me give an example:
List 1
[lots of text 1]
[lots of text 2]
[lots of text 3]
[lots of text 4]
.
.
.
List 2
[lots of text 5]
[lots of text 6]
[lots of text 2]
[lots of text 3]
[lots of text 7]
.
.
List 3
[lots of text 8]
[lots of text 2]
[lots of text 9]
[lots of text 10]
[lots of text 11]
.
.
.
After I run the algorithm, I seek an output like the below (format isn't important):
[lots of text 1] --> List 1
[lots of text 2] --> List 1,List 2, List 3
[lots of text 3] --> List 1, List 2
[lots of text 4] --> List 1
.
.
.
[lots of text 11]