This is what I do to find all double lines in a textfile
import regex #regex is as re
#capture all lines in buffer
r = f.readlines()
#create list of all linenumbers
lines = list(range(1,endline+1))
#merge both lists
z=[list(a) for a in zip(r, lines)]
#sort list
newsorting = sorted(z)
#put doubles in list
listdoubles = []
for i in range(0,len(newsorting)-1):
if (i+1) <= len(newsorting):
if (newsorting[i][0] == newsorting[i+1][0]) and (not regex.search('^\s*$',newsorting[i][0])):
listdoubles.append(newsorting[i][1])
listdoubles.append(newsorting[i+1][1])
#remove event. double linenumbers
listdoubles = list(set(listdoubles))
#sort line numeric
listdoubles = sorted(listdoubles, key=int)
print(listdoubles)
But it is very slow. When I have over 10.000 lines it takes 10 seconds to create this list.
Is there a way to do it faster?