0

Need java function to find the longest duplicate substring in a string

For instance, if the input is “banana”,output should be "ana" and we have count the number of times it has appeared in this case it is 2.

The solution is as below

public class Test{
public static void main(String[] args){
System.out.println(findLongestSubstring("i like ike"));
System.out.println(findLongestSubstring("madam i'm adam"));
System.out.println(findLongestSubstring("When life hands you lemonade, make lemons"));
System.out.println(findLongestSubstring("banana"));
}

public static String findLongestSubstring(String value) {
    String[] strings = new String[value.length()];
    String longestSub = "";

    //strip off a character, add new string to array
    for(int i = 0; i < value.length(); i++){
        strings[i] = new String(value.substring(i));
    }

    //debug/visualization
    //before sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Arrays.sort(strings);
    System.out.println();

    //debug/visualization
    //after sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Vector<String> possibles = new Vector<String>();
    String temp = "";
    int curLength = 0, longestSoFar = 0;

    /*
     * now that the array is sorted compare the letters
     * of the current index to those above, continue until 
     * you no longer have a match, check length and add
     * it to the vector of possibilities
     */ 
    for(int i = 1; i < strings.length; i++){
        for(int j = 0; j < strings[i-1].length(); j++){
            if (strings[i-1].charAt(j) != strings[i].charAt(j)){
                break;
            }
            else{
                temp += strings[i-1].charAt(j);
                curLength++;
            }
        }
        //this could alleviate the need for a vector
        //since only the first and subsequent longest 
        //would be added; vector kept for simplicity
        if (curLength >= longestSoFar){
            longestSoFar = curLength;
            possibles.add(temp);
        }
        temp = "";
        curLength = 0;
    }

    System.out.println("Longest string length from possibles: " + longestSoFar);

    //iterate through the vector to find the longest one
    int max = 0;
    for(int i = 0; i < possibles.size();i++){
        //debug/visualization
        System.out.println(possibles.elementAt(i));
        if (possibles.elementAt(i).length() > max){ 
            max = possibles.elementAt(i).length();
            longestSub = possibles.elementAt(i);
        }
    }
    System.out.println();
    //concerned with whitespace up until this point
    // "lemon" not " lemon" for example
    return longestSub.trim(); 
}

}

Deepak
  • 2,094
  • 8
  • 35
  • 48
  • 1
    Interesting question, but have you tried something? – khachik Dec 15 '10 at 18:31
  • 4
    http://stackoverflow.com/questions/2172033/find-the-longest-repeating-string-and-the-number-of-times-it-repeats-in-a-given-s – NPE Dec 15 '10 at 18:33
  • @khachik,i dont know how to proceed – Deepak Dec 15 '10 at 18:33
  • @Aix,do you have the java function for the same,it says use a suffix tree – Deepak Dec 15 '10 at 18:34
  • @Deepak If this is homework, you should tag it as such. – NPE Dec 15 '10 at 18:35
  • Does it need to be efficient? If not, there's an easy algorithm that uses nothing but a few variables and loops. – IVlad Dec 15 '10 at 18:36
  • @IVIad,could you post the java function/algorithm for the same. – Deepak Dec 15 '10 at 18:37
  • 1
    @Deepak you can try to implement it as you imagine, or there is a [wikipedia page about it](http://en.wikipedia.org/wiki/Longest_repeated_substring_problem) – khachik Dec 15 '10 at 18:38
  • @deepak: the general structure looks fine. some pointers: (a) `curLength` is strictly not needed (it's value is always the same as `j`). (b) temp is not needed. you can just use `strings[i-1].substring(0, j)`. (c) there's no need for the `possibles`. just store the longest one seen so far. (d) if you need to ignore spaces, you need to ignore them when you compare the length. Otherwise you could get the wrong answer when there is an answer the same length as the answer with the spaces at the ends, and you choose the one with the spaces. – lijie Dec 17 '10 at 05:49
  • @lijie.could you PLEASE alter my code and let me know where i have gone wrong. – Deepak Dec 17 '10 at 09:20
  • Use Suffix Array+LCP array, and get the solution in O(n) or O(nlogn) or O(nlog^2n) depending on the implementation. It's much better than Dynamic Programming approach. Also because linear time approach has lots of constant overheads, nlogn implementation should work well. – Prem KTiw Jan 25 '17 at 13:53

2 Answers2

5

This is a common CS problem with a dynamic programming solution.

Edit (for lijie):

You are technically correct -- this is not the exact same problem. However this does not make the link above irrelevant and the same approach (w.r.t. dynamic programming in particular) can be used if both strings provided are the same -- only one modification needs to be made: don't consider the case along the diagonal. Or, as others have pointed out (e.g. LaGrandMere), use a suffix tree (also found in the above link).

Edit (for Deepak):

A Java implementation of the Longest Common Substring (using dynamic programming) can be found here. Note that you will need to modify it to ignore "the diagonal" (look at the Wikipedia diagram) or the longest common string will be itself!

  • 1
    the problem _isn't_ longest common substring. at least the mapping is not trivial. note that this problem has only 1 input string, whereas the LCS problem is getting the longest common substring between _2_ input strings. – lijie Dec 16 '10 at 00:09
  • @lijie Thanks for keeping me on my toes. I have updated the answer. –  Dec 16 '10 at 02:41
  • @pst,i need the longest duplicate substring in a string,Will your implementation return "ana" as the answer – Deepak Dec 16 '10 at 06:33
  • @lijie,will the answer return "ana" as answer – Deepak Dec 16 '10 at 08:53
  • @Deepak: yes it will. however, it (the DP solution) is not the most efficient algorithm possible (that would be suffix trees: O(n)) – lijie Dec 16 '10 at 09:06
  • @lijie,i was browing one of websites and i found useful information on the same Suffix concepts rather similar,maintain a suffix array and then sort it,compare adjacent pairs with each other and return the largest duplicate substring.how should i implement this algorithm now...can you guide me ? – Deepak Dec 16 '10 at 09:09
  • @Deepak: "this" algorithm is what algorithm? suffix tree, suffix array or what? And which part of the algorithm? construction? – lijie Dec 16 '10 at 09:18
  • @lijie : http://programmingpraxis.com/2010/12/14/longest-duplicated-substring/ ...he has explained the approach of doing this.i dont know how to write the code for the same in java – Deepak Dec 16 '10 at 09:23
1

In Java : Suffix Tree.

Thanks to the ones that have found how to solve it, I didn't know.

LaGrandMere
  • 10,265
  • 1
  • 33
  • 41