I'm looking for a fast algorithm that searchs for the longest repeating substring (repeated at least 1 time) in a given string, with the lowest time complexity possible and (if possible) memory wise (RAM).
I've seen some implementations but most of them aren't designed for a ton of characters (let's say 4k, 400k, 4m... length
). One example is this one:
from collections import deque
def largest_substring_algo1(string):
l = list(string)
d = deque(string[1:])
match = []
longest_match = []
while d:
for i, item in enumerate(d):
if l[i]==item:
match.append(item)
else:
if len(longest_match) < len(match):
longest_match = match
match = []
d.popleft()
return ''.join(longest_match)
I've been trying with a string containing 103440816326530612244897959183673469387755102040816326530612244897959183673469387755
100 times.
It works well for small strings (<1k length) but it behaves strange for those sizes mentioned before.
EDIT: is there any way to do it without loading a (let's say 20GB) file in memory?