I found this very challenging coding problem online which I though I'd give a try.
The general idea is that given string of text T
and pattern P
, find the occurrences of this pattern, sum up it's corresponding value and return max and min. If you want to read the problem in more details, please refer to this.
However, below is the code I've provided, it works for a simple test case, but when running on multiple and complex test cases its pretty slow, and I'm not sure where my code needs to be optimized.
Can anyone please help where im getting the logic wrong.
public class DeterminingDNAHealth {
private DeterminingDNAHealth() {
/*
* Fixme:
* Each DNA contains number of genes
* - some of them are beneficial and increase DNA's total health
* - Each Gene has a health value
* ======
* - Total health of DNA = sum of all health values of beneficial genes
*/
}
int checking(int start, int end, String pattern) {
String[] genesChar = new String[] {
"a",
"b",
"c",
"aa",
"d",
"b"
};
String numbers = "123456";
int total = 0;
for (int i = start; i <= end; i++) {
total += KMPAlgorithm.initiateAlgorithm(pattern, genesChar[i]) * (i + 1);
}
return total;
}
public static void main(String[] args) {
String[] genesChar = new String[] {
"a",
"b",
"c",
"aa",
"d",
"b"
};
Gene[] genes = new Gene[genesChar.length];
for (int i = 0; i < 6; i++) {
genes[i] = new Gene(genesChar[i], i + 1);
}
String[] checking = "15caaab 04xyz 24bcdybc".split(" ");
DeterminingDNAHealth DNA = new DeterminingDNAHealth();
int i, mostHealthiest, mostUnhealthiest;
mostHealthiest = Integer.MIN_VALUE;
mostUnhealthiest = Integer.MAX_VALUE;
for (i = 0; i < checking.length; i++) {
int start = Character.getNumericValue(checking[i].charAt(0));
int end = Character.getNumericValue(checking[i].charAt(1));
String pattern = checking[i].substring(2, checking[i].length());
int check = DNA.checking(start, end, pattern);
if (check > mostHealthiest)
mostHealthiest = check;
else
if (check < mostUnhealthiest)
mostUnhealthiest = check;
}
System.out.println(mostHealthiest + " " + mostUnhealthiest);
// DNA.checking(1,5, "caaab");
}
}
KMPAlgorithm
public class KMPAlgorithm {
KMPAlgorithm() {}
public static int initiateAlgorithm(String text, String pattern) {
// let us generate our LPC table from the pattern
int[] partialMatchTable = partialMatchTable(pattern);
int matchedOccurrences = 0;
// initially we don't have anything matched, so 0
int partialMatchLength = 0;
// we then start to loop through the text, !note, not the pattern. The text that we are testing the pattern on
for (int i = 0; i < text.length(); i++) {
// if there is a mismatch and there's no previous match, then we've hit the base-case, hence break from while{...}
while (partialMatchLength > 0 && text.charAt(i) != pattern.charAt(partialMatchLength)) {
/*
* otherwise, based on the number of chars matched, we decrement it by 1.
* In fact, this is the unique part of this algorithm. It is this part that we plan to skip partialMatchLength
* iterations. So if our partialMatchLength was 5, then we are going to skip (5 - 1) iteration.
*/
partialMatchLength = partialMatchTable[partialMatchLength - 1];
}
// if however we have a char that matches the current text[i]
if (text.charAt(i) == pattern.charAt(partialMatchLength)) {
// then increment position, so hence we check the next char of the pattern against the next char in text
partialMatchLength++;
// we will know that we're at the end of the pattern matching, if the matched length is same as the pattern length
if (partialMatchLength == pattern.length()) {
// to get the starting index of the matched pattern in text, apply this formula (i - (partialMatchLength - 1))
// this line increments when a match string occurs multiple times;
matchedOccurrences++;
// just before when we have a full matched pattern, we want to test for multiple occurrences, so we make
// our match length incomplete, and let it run longer.
partialMatchLength = partialMatchTable[partialMatchLength - 1];
}
}
}
return matchedOccurrences;
}
private static int[] partialMatchTable(String pattern) {
/*
* TODO
* Note:
* => Proper prefix: All the characters in a string, with one or more cut off the end.
* => proper suffix: All the characters in a string, with one or more cut off the beginning.
*
* 1.) Take the pattern and construct a partial match table
*
* To construct partial match table {
* 1. Loop through the String(pattern)
* 2. Create a table of size String(pattern).length
* 3. For each character c[i], get The length of the longest proper prefix in the (sub)pattern
* that matches a proper suffix in the same (sub)pattern
* }
*/
// we will need two incremental variables
int i, j;
// an LSP table also known as “longest suffix-prefix”
int[] LSP = new int[pattern.length()];
// our initial case is that the first element is set to 0
LSP[0] = 0;
// loop through the pattern...
for (i = 1; i < pattern.length(); i++) {
// set our j as previous elements data (not the index)
j = LSP[i - 1];
// we will be comparing previous and current elements data. ei char
char current = pattern.charAt(i), previous = pattern.charAt(j);
// we will have a case when we're somewhere in loop and two chars will not match, and j is not in base case.
while (j > 0 && current != previous)
// we decrement our j
j = LSP[j - 1];
// simply put, if two characters are same, then we update our LSP to say that at that point, we hold the j's value
if (current == previous)
// increment our j
j++;
// update the table
LSP[i] = j;
}
return LSP;
}
}
Cource code credit to Github