I have a specified String and I need to check this String for identical parts of specified length. For example if the String is "abcdab" and the length is specified as 2, the identical parts in this string are "ab" (always looking for the most repeated one). I reworked my algorithm 4-5 times for the best performance, but in the end, if the length of the String is 1m+, it throws an Java Heap space error.
So my question is: how to solve the error, maybe there is another way of checking the identical parts, or maybe some other way how to construct the whole algorithm. I figured out 1 possible solution of this, but it works very slowly, so I'm asking only for solutions as fast as my current algorithm or perhaps, even faster ones. Here is the current code:
int length = 2;
String str = "ababkjdklfhcjacajca";
ArrayList<String> h = new ArrayList<String>();
h.add(str.substring(0, length));
ArrayList<Integer> contains = new ArrayList<Integer>();
contains.add(1);
String c;
for (int g = 1; g < str.length()-length+1; g++) {
c = str.substring(g, length+g);
for (int e = 0; e < h.size(); e++) {
if (h.get(e).charAt(0) == c.charAt(0) && h.get(e).charAt(length-1) == c.charAt(length-1)) {
if (h.get(e).equals(c)) {
contains.set(e, contains.get(e)+1);
break;
}
}
else if (e+1 == h.size()) {
h.add(c);
contains.add(1);
break;
}
}
}
ArrayList h
stores every unique part of the String, ArrayList contains represents amount of repeats of that every unique part of the string.
String c
is the main problem (java heap space points here). It gradually represents each part of the string before it gets stored in the ArrayList h
(if that c
is unique).
After that I'll just find the most repeated ones by using the ArrayLists
and print them.