I'm searching for a general advise on how to check the similarity between two running texts.
What I need is a idea/draft on a algorithm that compares two running texts with each other and outputs how similar both are, in best case with a good runtime.
For example text A is to 90% similar to text B.
Standart checks if text A contains keywords and passages of text B isn't enough for my case.
I googled a lot and the best i stumbled upon was text mining, but that's pretty much not what i was searching for.
Does a common solution for this kind problem exist or do i need for a more individual solution?
Update: A example: As I said it's a running text so a text can contain more than one or two sentences. More likely a text will contain 20-50 sentences but here is a short example.
Text A: "Lorem ipsum dolor sit amet, consectetuer adipiscing elit." Text B: "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa."
I would rate the text about 40-50% similar, because Text B contains the Text A full.
This part should be done by the algorithm - a deviation of below 10 percent is ok! ;)
But this was just for a simple example to understand. The texts I will use are sometimes not similar to each other at all!