1

i have 2 sequences, for instance s=aaba and ss=aa, and i want all the way ss is in s. In this example: [0,1], [0,3] and [1,3] My code is below. It works fine, except for very long s with multiple ss. In that case i've got

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded (I already use java with -Xmx at the maximum I can…)

public static ArrayList<ArrayList<Integer>> getListIndex(String[] s, String[] ss, int is, int iss) {
    ArrayList<ArrayList<Integer>> listOfListIndex = new ArrayList<ArrayList<Integer>>();
    ArrayList<ArrayList<Integer>> listRec = new ArrayList<ArrayList<Integer>>();
            ArrayList<Integer> listI = new ArrayList<Integer>();

    if (iss<0||is<iss){
        return listOfListIndex;
    }

    if (ss[iss].compareTo(s[is])==0){

        //ss[iss] matches, search ss[0..iss-1] in s[0..is-1]
        listRec = getListIndex(s,ss,is-1,iss-1);

        //empty lists (iss=0 for instance)
        if(listRec.size()==0){
            listI = new  ArrayList<Integer>();
            listI.add(is);
            listOfListIndex.add(listI);
        }
        else{
            //adding to what we have already found
            for (int i=0; i<listRec.size();i++){
                listI = listRec.get(i);
                    listI.add(is);
                    listOfListIndex.add(listI);
            }
        }
    }
    //In all cases
    //searching ss[0..iss] in s[0..is-1]
    listRec = getListIndex(s,ss,is-1,iss);
    for (int i=0; i<listRec.size();i++){
        listI = listRec.get(i);
            listOfListIndex.add(listI);
    }

    return listOfListIndex;
}   

Is there anyway to do this more efficiently ?

fffcc
  • 13
  • 2
  • Why is the solution to the example [0,1], [0,3] and [1,3] ? – aioobe Jun 22 '10 at 16:02
  • @aioobe - What he means is that he wants the distance between each set of common characters. So if we label each character by its array index, there are a's at positions 0, 1, and 3 - and so that solution set is the set of combinations of a's. – G__ Jun 22 '10 at 16:11

3 Answers3

0

Well, the basic problem is that your algorithm is recursive. Java doesn't do tail-call optimization, so every recursive call just adds to the stack until you overflow.

What you want to do is re-structure your algorithm to be iterable so you aren't adding to the stack. Think about putting a loop (with a termination test) as the outer-most element of your method instead.

Another way to look at this problem is to break it into two steps:

  1. Capture the positions of all of the given character ('a' in your example) into a single set.
  2. All you want here is the complete set of combinations between them. Remember that the equation for number of combinations of r things chosen from n different things is:

     C(n,r) = n!/[r!(n-r)!]
    
G__
  • 7,003
  • 5
  • 36
  • 54
0

ArrayList<Integer> is quite memory-inefficient due to the overhead of the wrapper class. Using TIntArrayList from GNU Trove will probably cut down your memory usage by a factor of 3 (or even more if you're running on a 64bit JVM).

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
0

I doubt the recursion is the problem (think of what the maximum recursion depth is). The algorithm can be efficiently implemented by collecting the indecies of each character of s in ss in TreeSets and then simply taking the .tailSet when needing to "advance" in the string.

import java.util.*;


public class Test {

    public static Set<List<Integer>> solutions(List<TreeSet<Integer>> is, int n) {

        TreeSet<Integer> ts = is.get(0);
        Set<List<Integer>> sol = new HashSet<List<Integer>>();
        for (int i : ts.tailSet(n+1)) {
            if (is.size() == 1) {
                List<Integer> l = new ArrayList<Integer>();
                l.add(i);
                sol.add(l);
            } else 
                for (List<Integer> tail : solutions(is.subList(1, is.size()), i)) {
                    List<Integer> l = new ArrayList<Integer>();
                    l.add(i);
                    l.addAll(tail);
                    sol.add(l);
                }
        }
        return sol;
    }


    public static void main(String[] args) {
        String ss = "aaba";
        String s = "aa";

        List<TreeSet<Integer>> is = new ArrayList<TreeSet<Integer>>();

        // Compute all indecies of each character.
        for (int i = 0; i < s.length(); i++) {
            TreeSet<Integer> indecies = new TreeSet<Integer>();
            char c = s.charAt(i);
            for (int j = 0; j < ss.length(); j++) {
                if (ss.charAt(j) == c)
                    indecies.add(j);
            }
            is.add(indecies);
        }

        System.out.println(solutions(is, -1));
    }
}

Output:

[[0, 1], [1, 3], [0, 3]]
aioobe
  • 413,195
  • 112
  • 811
  • 826