KMP Algorithm for string search?

Question

I found this very challenging coding problem online which I though I'd give a try.

The general idea is that given string of text T and pattern P, find the occurrences of this pattern, sum up it's corresponding value and return max and min. If you want to read the problem in more details, please refer to this.

However, below is the code I've provided, it works for a simple test case, but when running on multiple and complex test cases its pretty slow, and I'm not sure where my code needs to be optimized.

Can anyone please help where im getting the logic wrong.

public class DeterminingDNAHealth {


  private DeterminingDNAHealth() {
    /*
     * Fixme:
     *  Each DNA contains number of genes
     *   - some of them are beneficial and increase DNA's total health
     *   - Each Gene has a health value
     *   ======
     *   - Total health of DNA = sum of all health values of beneficial genes
     */
  }

  int checking(int start, int end, String pattern) {
    String[] genesChar = new String[] {
      "a",
      "b",
      "c",
      "aa",
      "d",
      "b"
    };
    String numbers = "123456";

    int total = 0;

    for (int i = start; i <= end; i++) {
      total += KMPAlgorithm.initiateAlgorithm(pattern, genesChar[i]) * (i + 1);
    }

    return total;
  }

  public static void main(String[] args) {

    String[] genesChar = new String[] {
      "a",
      "b",
      "c",
      "aa",
      "d",
      "b"
    };
    Gene[] genes = new Gene[genesChar.length];

    for (int i = 0; i < 6; i++) {
      genes[i] = new Gene(genesChar[i], i + 1);
    }

    String[] checking = "15caaab 04xyz 24bcdybc".split(" ");


    DeterminingDNAHealth DNA = new DeterminingDNAHealth();
    int i, mostHealthiest, mostUnhealthiest;

    mostHealthiest = Integer.MIN_VALUE;
    mostUnhealthiest = Integer.MAX_VALUE;

    for (i = 0; i < checking.length; i++) {
      int start = Character.getNumericValue(checking[i].charAt(0));
      int end = Character.getNumericValue(checking[i].charAt(1));
      String pattern = checking[i].substring(2, checking[i].length());

      int check = DNA.checking(start, end, pattern);

      if (check > mostHealthiest)
        mostHealthiest = check;
      else
      if (check < mostUnhealthiest)
        mostUnhealthiest = check;
    }

    System.out.println(mostHealthiest + " " + mostUnhealthiest);

    // DNA.checking(1,5, "caaab");
  }
}

KMPAlgorithm

public class KMPAlgorithm {

  KMPAlgorithm() {}


  public static int initiateAlgorithm(String text, String pattern) {

    // let us generate our LPC table from the pattern
    int[] partialMatchTable = partialMatchTable(pattern);

    int matchedOccurrences = 0;

    // initially we don't have anything matched, so 0
    int partialMatchLength = 0;

    // we then start to loop through the text, !note, not the pattern. The text that we are testing the pattern on
    for (int i = 0; i < text.length(); i++) {

      // if there is a mismatch and there's no previous match, then we've hit the base-case, hence break from while{...}
      while (partialMatchLength > 0 && text.charAt(i) != pattern.charAt(partialMatchLength)) {

        /*
         * otherwise, based on the number of chars matched, we decrement it by 1.
         * In fact, this is the unique part of this algorithm. It is this part that we plan to skip partialMatchLength
         * iterations. So if our partialMatchLength was 5, then we are going to skip (5 - 1) iteration.
         */
        partialMatchLength = partialMatchTable[partialMatchLength - 1];

      }


      // if however we have a char that matches the current text[i]
      if (text.charAt(i) == pattern.charAt(partialMatchLength)) {

        // then increment position, so hence we check the next char of the pattern against the next char in text
        partialMatchLength++;

        // we will know that we're at the end of the pattern matching, if the matched length is same as the pattern length
        if (partialMatchLength == pattern.length()) {
          // to get the starting index of the matched pattern in text, apply this formula (i - (partialMatchLength - 1))

          // this line increments when a match string occurs multiple times;
          matchedOccurrences++;

          // just before when we have a full matched pattern, we want to test for multiple occurrences, so we make
          // our match length incomplete, and let it run longer.
          partialMatchLength = partialMatchTable[partialMatchLength - 1];

        }
      }

    }

    return matchedOccurrences;


  }


  private static int[] partialMatchTable(String pattern) {
    /*
     * TODO
     *  Note:
     *  => Proper prefix: All the characters in a string, with one or more cut off the end.
     *  => proper suffix: All the characters in a string, with one or more cut off the beginning.
     *
     *  1.) Take the pattern and construct a partial match table
     *
     *  To construct partial match table {
     *      1. Loop through the String(pattern)
     *      2. Create a table of size String(pattern).length
     *      3. For each character c[i], get The length of the longest proper prefix in the (sub)pattern
     *         that matches a proper suffix in the same (sub)pattern
     *  }
     */

    // we will need two incremental variables
    int i, j;

    // an LSP table also known as “longest suffix-prefix”
    int[] LSP = new int[pattern.length()];


    // our initial case is that the first element is set to 0
    LSP[0] = 0;

    // loop through the pattern...
    for (i = 1; i < pattern.length(); i++) {

      // set our j as previous elements data (not the index)
      j = LSP[i - 1];


      // we will be comparing previous and current elements data. ei char
      char current = pattern.charAt(i), previous = pattern.charAt(j);

      // we will have a case when we're somewhere in loop and two chars will not match, and j is not in base case.
      while (j > 0 && current != previous)
        // we decrement our j
        j = LSP[j - 1];

      // simply put, if two characters are same, then we update our LSP to say that at that point, we hold the j's value
      if (current == previous)
        // increment our j
        j++;

      // update the table
      LSP[i] = j;


    }

    return LSP;

  }
}

Cource code credit to Github

`If you want to read the problem in more details, please refer to this.` Where is the link? — Pham Trung, Oct 19 '17 at 02:58
Apologies, I've updated the question. The link should be available . @PhamTrung — , Oct 19 '17 at 05:36
Have you taken a look at the editorial? your time complexity is O(n*m) with n is the amount of queries and m is amount of patterns, with n is 10^5 and m is 10^5, obviously it will not fit to the time limit. — Pham Trung, Oct 19 '17 at 06:50
You are correct, Im trying to optimize my time complexity? Any suggestions? @PhamTrung — , Oct 19 '17 at 09:06
My suggestion is that KMP might not be best algorithm in this instance. I have to look at something called Aho-Corasick string matching algorithm. Ill get back to you on updated code :) @PhamTrung — , Oct 19 '17 at 09:22

score 0 · Answer 1 · answered Nov 10 '17 at 16:33

You may try this KMP implementation. It is O(m+n), as KMP is intended to be. It should be a lot faster:

private static int[] failureFunction(char[] pattern) {
    int m = pattern.length;

    int[] f = new int[pattern.length];
    f[0] = 0;

    int i = 1;
    int j = 0;

    while (i < m) {
        if (pattern[i] == pattern[j]) {
            f[i] = j + 1;
            i++;
            j++;
        } else if (j > 0) {
            j = f[j - 1];
        } else {
            f[i] = 0;
            i++;
        }
    }
    return f;
}

private static int kmpMatch(char[] text, char[] pattern) {
    int[] f = failureFunction(pattern);

    int m = pattern.length;
    int n = text.length;

    int i = 0;
    int j = 0;

    while (i < n) {
        if (pattern[j] == text[i]) {
            if (j == m - 1){
                return i - (m - 1);
            } else {
                i++;
                j++;
            }
        } else if (j > 0) {
            j = f[j - 1];
        } else {
            i++;
        }
    }
    return -1;
}

KMP Algorithm for string search?

1 Answers1