I have two text files which I'd like to compare. What I did is:
- I've split both of them into sentences.
- I've measured levenshtein distance between each of the sentences from one file with each of the sentences from second file.
I'd like to calculate average similarity between those two text files, however I have trouble to deliver any meaningful value - obviously arithmetic mean (sum of all the distances [normalized] divided by number of comparisions) is a bad idea.
How to interpret such results?
edit: Distance values are normalized.