I am trying to compare two text files of about 1MB each in Python using difflib
's SequenceMatcher
. I find that it gives a really poor time complexity when comparing files of this size taking up to 7 minutes last time I ran it.
Is there a more efficient way in Python to achieve this, without the use of hashing, that will also provide the percentage or ratio of similarity between the two files.
This is my existing code:
file1 = input()
file2 = input()
text1 = open("./text-files/" + f1 + ".txt").read()
text2 = open("./text-files/" + f2 + ".txt").read()
m = SequenceMatcher(None, text1, text2)
print(m.ratio())
Thanks