I have n list of large texts, I have to identify which are the texts that exist in all or the list numbers where it exists and with accuracy

Question

I have n list of large texts, I have to identify which are the texts that exist in all or the list numbers where it exists and with match accuracy(match percentage). I need an algorithm to implement the same. Preferably in .NET, however, open for options in any technology.

Let me give an example:
List 1
[lots of text 1]
[lots of text 2]
[lots of text 3]
[lots of text 4]
.
.
.

List 2
[lots of text 5]
[lots of text 6]
[lots of text 2]
[lots of text 3]
[lots of text 7]
.
.

List 3
[lots of text 8]
[lots of text 2]
[lots of text 9]
[lots of text 10]
[lots of text 11]
.
.
.

After I run the algorithm, I seek an output like the below (format isn't important):
[lots of text 1] --> List 1
[lots of text 2] --> List 1,List 2, List 3
[lots of text 3] --> List 1, List 2
[lots of text 4] --> List 1
.
.
.
[lots of text 11]

And where are you stuck so far? There are plenty of [string searching algorithms](https://en.wikipedia.org/wiki/String_searching_algorithm) and [metrics to determine similarity](https://en.wikipedia.org/wiki/String_metric). As posed, your question is not specific enough, and requests for specific tools or libraries are off-topic. — Jeroen Mostert, Apr 10 '18 at 09:06
it would have been fine if it was between two strings, the problem is : as my number of list grows and number of texts in each list, the time taken to identify would be extremely high, which is the reason for the question to understand what is the best option available or if anyone faced with the similar situation. — ajay26581, Apr 10 '18 at 09:12
[This question](https://stackoverflow.com/q/8897593/4137916) may be related. — Jeroen Mostert, Apr 10 '18 at 09:19

score 0 · Answer 1 · answered Apr 10 '18 at 10:40

0

Your question is so vague but from what you said, it sounded something like this

        var emptylistandStuff = new List<string>();
        var characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

        foreach (var item in listOfstuffandetc)
        {
            if (characters.Contains(item))
            {
                emptylistandStuff.Add(item);
            }
        }

answered Apr 10 '18 at 10:40

Have rephrased my question, hope the example gives enough clarity. Just to give an example "[lots of text]" could mean resumes, articles etc. – ajay26581 Apr 10 '18 at 12:41
Any other alternate solution? – ajay26581 Apr 11 '18 at 13:04

I have n list of large texts, I have to identify which are the texts that exist in all or the list numbers where it exists and with accuracy

1 Answers1