How to find longest common substring with less memory usage?

Question

I need to find longest common substring with greatest (length of substring * count).

For example, when I have string:

hi, hello world ... hello world ... hi, hello world

The answer is hello world, because (11 * 3) > (15 * 2).

I have found a relevant discussion in this question, but it is not practical to use in my case, because of its high memory usage.

Is there any better way how to do that?

are these suitable? http://stackoverflow.com/questions/19158025/longest-common-substring-in-two-sequences-of-strings — Stasik, Dec 30 '13 at 10:35
In the [Wikipedia article](https://en.wikipedia.org/wiki/Longest_common_substring_problem) there is a pseudocode solution using dynamic programming. — BioGeek, Dec 30 '13 at 10:39
A little. He is looking for longest substring. I'm looking for substring that takes most of space in the string (gratest length of substring * count) — Jakub Tětek, Dec 30 '13 at 10:40
"it is not practical to use this because it's high memory use". I disagree. you will insert each string in the trie, which will have a number of characters smaller than the sum of characters of each string. And you can divide the problems if there are too many strings to fit in memory — UmNyobe, Dec 30 '13 at 10:40
This question appears to be off-topic because the assumption than an existing approach on SO has "high memory usage" is neither explained nor proven — UmNyobe, Dec 30 '13 at 10:46

score 1 · Accepted Answer · answered Dec 30 '13 at 11:38

Here is a solution in perl. Probably it is not memory efficient but it works with the test string shown

use warnings;
use strict;

my $s="hi, hello world ... hello world ... hi, hello world";
my %h=();

#find the repeated strings, all of them
for (0..length($s) ) { 
    my $x=substr($s,$_); 
    for my  $m ($x=~/(.*).*\1/) { $h{$m}++} ; }

#find the count of each strings repeats
for my $f (keys %h) { $h{$f} = () = $s=~/\Q$f/g; }

#sort the length*count to find the best
my @ord=sort { length($b)*$h{$b} <=> length($a)*$h{$a} } keys %h;

#this one is the best
print $ord[0];

How to find longest common substring with less memory usage?

1 Answers1