5

I need to find the longest common substring of n strings and use the result in my project.

Is there any existing implementation/library in java which already does this?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
user1628340
  • 901
  • 4
  • 14
  • 27
  • 1
    This might be helpful http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring – Suresh Atta Jun 17 '13 at 14:48
  • @Baadshah- its for two strings... i need the implementation for n strings – user1628340 Jun 17 '13 at 14:58
  • Check this link [**Analysis of Longest common substring matching**](http://www.msccomputerscience.com/2014/10/analysis-of-longest-common-substring_18.html) – ARJUN Oct 26 '14 at 15:41

5 Answers5

10

What about concurrent-trees ?

It is a small (~100 KB) library available in Maven Central. The algorithm uses combination of Radix and Suffix Trees. Which is known to have a linear time complexity (wikipedia).

public static String getLongestCommonSubstring(Collection<String> strings) {
    LCSubstringSolver solver = new LCSubstringSolver(new DefaultCharSequenceNodeFactory());
    for (String s: strings) {
        solver.add(s);
    }
    return solver.getLongestCommonSubstring().toString();
}
dedek
  • 7,981
  • 3
  • 38
  • 68
6

We can use below code to identify longest common sub string of n Strings

public static String identifyCommonSubStrOfNStr(String [] strArr){

    String commonStr="";
    String smallStr ="";        

    //identify smallest String      
    for (String s :strArr) {
        if(smallStr.length()< s.length()){
            smallStr=s;
        }
    }

    String tempCom="";
    char [] smallStrChars=smallStr.toCharArray();               
    for (char c: smallStrChars){
        tempCom+= c;

        for (String s :strArr){
            if(!s.contains(tempCom)){
                tempCom=c;
                for (String s :strAarr){
                    if(!s.contains(tempCom)){
                        tempCom="";
                        break;
                    }
                }
                break;
            }               
        }

        if(tempCom!="" && tempCom.length()>commonStr.length()){
            commonStr=tempCom;  
        }                       
    }   

    return commonStr;
}
h.alex
  • 902
  • 1
  • 8
  • 31
user3653909
  • 71
  • 1
  • 3
0

This page pretty much gives exactly what you want in quite a few languages.

public static int longestSubstr(String first, String second) {
    if (first == null || second == null || first.length() == 0 || second.length() == 0) {
        return 0;
    }

    int maxLen = 0;
    int fl = first.length();
    int sl = second.length();
    int[][] table = new int[fl][sl];

    for (int i = 0; i < fl; i++) {
        for (int j = 0; j < sl; j++) {
            if (first.charAt(i) == second.charAt(j)) {
                if (i == 0 || j == 0) {
                    table[i][j] = 1;
                }
                else {
                    table[i][j] = table[i - 1][j - 1] + 1;
                }
                if (table[i][j] > maxLen) {
                    maxLen = table[i][j];
                }
            }
        }
    }
    return maxLen;
}
David
  • 19,577
  • 28
  • 108
  • 128
  • 1
    its for two substrings... it want it for n substrings... say if you have s1="aa111" s2="aa221" s3="1", lcs for s1 & s2 is aa , lcs for aa nad s3 is "" , but the correct answer is "1" – user1628340 Jun 17 '13 at 14:56
  • @user1628340 you can use the above code and just run it iteratively. If you have an N-element array, just run the code N-1 times, running it on the pairs 0,1; 1,2; 2,3; 3,4; ... until you're done. – sigpwned Jun 17 '13 at 14:59
  • 1
    @sigpwned that does'nt work too. say if have s1= "--a.txt", s2= "--b.txt" ,s3= "--c.swe" ... lcs of s1 and s2 is s4= .txt, lcs of s2 and s3 id s5= "--" ... in the next iteration, lcs of s4 and s5 is "", while the correct answer is "--" – user1628340 Jun 17 '13 at 15:37
  • ah, true. you'd need to track all common substrings for each pair, and compute the intersection as you go. choose the longest string from the final set; that's your lcs. – sigpwned Jun 17 '13 at 20:49
0

Well you could try to extend the version of wikipedia (http://en.wikipedia.org/wiki/Longest_common_substring_problem) for n Strings by putting it into a loop which iterates over all your Strings.

PrR3
  • 1,250
  • 1
  • 18
  • 26
  • 1
    looping wont work! say if you have s1="aa111" s2="aa221" s3="1", lcs for s1 & s2 is aa , lcs for aa nad s3 is "" , but the correct answer is "1" – user1628340 Jun 17 '13 at 14:57
  • the iteration should work on original strings, not on temporary results of other comparisons... then you get the correct solution! – PrR3 Jun 17 '13 at 15:03
  • that does'nt work too. say if have s1= "--a.txt", s2= "--b.txt" ,s3= "--c.swe" ... lcs of s1 and s2 is s4= .txt, lcs of s2 and s3 id s5= "--" ... in the next iteration, lcs of s4 and s5 is "", while the correct answer is "--" – user1628340 Jun 17 '13 at 15:40
0
public String findLongestCommonSubstring(String source, String dest) {
    int table[][] = new int[source.length() + 1][dest.length() + 1];
    for (int col = 0; col < table[0].length; col++) {
        table[0][col] = 0;
    }

    for (int row = 0; row < table.length; row++) {
        table[row][0] = 0;
    }

    int max = 0;
    int index = 0;
    for (int row = 1; row < table.length; row++) {
        char sourceChar = source.charAt(row - 1);
        for (int col = 1; col < table[row].length; col++) {
            char destChar = dest.charAt(col - 1);
            if (sourceChar == destChar) {
                table[row][col] = table[row - 1][col - 1] + 1;
                if (table[row][col] > max) {
                    max = table[row][col];
                    index = row;
                }
            } else {
                table[row][col] = 0;
            }
        }
    }
    return source.substring(index - max, index);
}
  • 2
    Welcome to Stack Overflow! While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Yunnosch Jan 08 '21 at 10:22